For this assignment, we will be exploring the Citrus Disease dataset. This dataset is composed of observations regarding citrus fruits that are infected with either black spots and citrus canker.
This dataset was sourced from Kaggle and consists of around 2,400 observations.
The team consists of four members:
Please note that the observations contained within the report are for a single run of the project. With multiple editors and therefore multiple runs of the project - this data may be slightly different from one run to the next. For this reason, the figures referred in an analysis may be slightly different than what is observed, but the values are approximately the same and the analysis itself should not be altered significantly by this difference.
import subprocess
import platform
import os
import itertools
import time
from IPython.display import display, HTML, Markdown, clear_output
import matplotlib.pyplot as plt
import seaborn as sns
import cv2
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import (
utils,
models,
layers,
metrics,
preprocessing,
callbacks,
)
from tensorflow.keras.models import Sequential
from tensorflow.keras.models import Model
from tensorflow.keras.layers import (
Reshape,
Input,
Dense,
Dropout,
Activation,
Flatten,
Conv2D,
MaxPooling2D,
average,
Add,
concatenate,
)
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.regularizers import l2
import tensorflow_addons as tfa
from sklearn import metrics
from sklearn.metrics import (
confusion_matrix,
classification_report,
make_scorer,
classification_report,
accuracy_score,
precision_score,
recall_score,
f1_score,
ConfusionMatrixDisplay,
roc_curve,
auc,
)
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import (
train_test_split,
StratifiedKFold,
cross_val_score,
)
from scipy.signal import savgol_filter
def clear_screen():
time.sleep(2)
print("Clearing screen...")
time.sleep(2)
clear_output()
clear_screen()
LIMIT_GPU_MEMORY = False
if LIMIT_GPU_MEMORY:
# Limit GPU Memory Usage
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(
physical_devices[0],
False
)
os.environ["TF_ENABLE_ONEDNN_OPTS"] = str(0)
For evaluating our model's performance, we will be using recall. Between the two diseases, citrus canker is more severe, more contagious, and more difficult to control than black spots. We want to consider false negatives (the model predicts black spots when it is actually citrus canker) as we evaluate whether our model is performing well or not.
Beyond recall, we use metrics such as accuracy, misclassificaiton, precision, recall, and F1-score to measure and compare algorithm performance.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall.
Additionally, we graph the values of Loss and AUC (Area Under Curve) for comparisons.
We want to consider all of these metrics since not only do we want our models to perform well, but we want to achieve the highest true positive rate for both classes as possible. Given the importance of recognizing false positive and negatives, especially since citrus canker is more severe than black spot, we need to measure and quantify the performance our our models by all of these metrics.
✅ Choose and explain what metric(s) you will use to evaluate your algorithm’s performance. You should give a detailed argument for why this (these) metric(s) are appropriate on your data. That is, why is the metric appropriate for the task (e.g., in terms of the business case for the task). Please note: rarely is accuracy the best evaluation metric to use. Think deeply about an appropriate measure of performance.
We have around 2,400 observations, so the amount of data is not a concern.dation technique. into training and testing folders. We also assessed the distribution of the class variable and found that the target class variable is evenly distributed.
We feel that the data is already well-structured and there is not a need to split the data into subsets or "folds". The training and testing folders appear to be representative of the overall population of images that the model is expected to encounter.
In discussions with Dr. Alford and our TA, we received confirmation that we may proceed with the current testing and training datasets as they are, without performing cross-validation.
As we do this, we are aware that not performing cross-validation or overfitting, which we will keep in mind as we evaluate our model.
We additonally received confirmation that it would be acceptable to use grayscale versions of the images for the CNN models.
We feel that not using cross-validation can be justified for a few reasons:
For one, the dataset size is large enough and representative of the overall population of images, so it may not be necessary to perform cross-validation as the model can learn from the full dataset without having to split it into subsets.
Additionally, as the training and testing folders have a balanced distribution of images across all classes, then cross-validation may not be necessary as the model canlearn again from the full dataset without having to split it into subsets.
# Change this variable to True to update the requirements file
update_environment = False
env_name = "lab6"
filename = "environment.yml"
command = f"conda env export -n {env_name} --from-history > {filename}"
if update_environment:
if platform.system() == "Windows":
subprocess.run(command, shell=True)
else:
subprocess.run(command, shell=True, executable="/bin/bash")
print("Environment updated.")
else:
print("Environment not updated.")
# Install environment with:
# conda create --file environment.yml
Environment not updated.
Our code below loads images of our two classes - "citrus canker" and "black spot" - and returns their normalized and resized versions. The function takes a directory path to our image folder as input and returns two lists: one containing the images and the other containing their corresponding class labels.
It then combines the training and testing datasets into two arrays, one for the images and one for the labels.
The class distribution of the dataset is visualized with a bar chart. The data is then split into training and testing sets using the train_test_split() function from scikit-learn.
Finally, the function plot_gallery() is defined to display a gallery of images from the training set with their respective class labels.
The code below defines two image datasets using the image_dataset_from_directory function provided by TensorFlow. One dataset is for training and the other dataset is for testing, with the images in both datasets resized to 64x64 pixels. We then display a sample of the images from the training dataset with their corresponding class labels.
Additionally, the pixel values of the images in both datasets are normalized to be between 0 and 1 using the normalize function.
Finally, the process_ds_to_numpy function is defined, which converts the datasets to numpy arrays for use in a machine learning model by iterating through each image in the dataset and appending its pixel values to a list. The function then concatenates the lists to create a numpy array. The resulting numpy arrays for the training and testing datasets are x_train, y_train, x_test, and y_test.
img_width, img_height = 64, 64
img_color_mode = "grayscale" # can also be 'rgb'
classes = {0: "citrus canker", 1: "black spot"}
n_classes = 2
train_ds = tf.keras.utils.image_dataset_from_directory(
"./train/",
labels="inferred",
label_mode="int",
class_names=None,
color_mode=img_color_mode,
batch_size=32,
image_size=(img_width, img_height),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=False,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
"./test/",
labels="inferred",
label_mode="int",
class_names=None,
color_mode=img_color_mode,
batch_size=32,
image_size=(img_width, img_height),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=False,
)
class_names = train_ds.class_names
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
for i in range(25):
ax = plt.subplot(5, 5, i + 1)
plt.imshow(images[i].numpy().astype("uint8"), cmap="gray")
plt.title(class_names[labels[i]])
plt.axis("off")
plt.suptitle("Image Classes")
plt.show()
Found 2032 files belonging to 2 classes.
2023-05-04 00:02:04.943574: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.188027: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.188364: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.191353: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-05-04 00:02:05.192687: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.192985: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.193176: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.809361: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.809491: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.809553: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-05-04 00:02:05.809601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3373 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3060, pci bus id: 0000:01:00.0, compute capability: 8.6
Found 407 files belonging to 2 classes.
This code defines functions for normalizing the pixel values of images and converting TensorFlow datasets into numpy arrays.
The normalize function takes an image and its label as input, normalizes the pixel values of the image to be between 0 and 1, and returns the normalized image and its label. The train_ds and test_ds datasets are then mapped to the normalize function to normalize the pixel values of their images.els.
Finally, the process_ds_to_numpy function is used to convert the train_ds and test_ds datasets to numpy arrays, and the last line of the code determines the number of dimensions of the x_train numpy array.
def normalize(image, label):
"""Normalize the pixel values of the image to be between 0 and 1."""
return tf.cast(image, tf.float32) / 255.0, label
train_ds = train_ds.map(normalize)
test_ds = test_ds.map(normalize)
The process_ds_to_numpy function takes a dataset as input, iterates over the elements of the dataset, converts the images and labels to numpy arrays, and returns the concatenated numpy arrays of images and labels. Finally, the process_ds_to_numpy function is used to convert the train_ds and test_ds datasets to numpy arrays, and the last line of the code determines the number of dimensions of the x_train numpy array.
def process_ds_to_numpy(ds) -> tuple:
"""Returns the x, y numpy arrays from the dataset."""
x, y = [], []
for image, label in ds.as_numpy_iterator():
x.append(np.array(image, dtype=np.float32))
y.append(np.array(label, dtype=np.int32))
return np.concatenate(x, axis=0), np.concatenate(y, axis=0)
x_train, y_train = process_ds_to_numpy(train_ds)
x_test, y_test = process_ds_to_numpy(test_ds)
n_dimensions = x_train.shape[-1]
After loading and preprocessing the data, we will display the first image in the training dataset with its class label.
plt.imshow(x_train[0], cmap='gray')
plt.axis('off')
plt.show()
The ImageDataGenerator function applies various image transformations on-the-fly during model training. The transformations include rotations, zooms, shifts, and flips. The generator is then fitted to the training and test data using the fit method.
Creating more image transformations for the existing data allows us to expand the size of our training dataset, and with the rotation, zoom, shift, and flip techniques, our models will be exposed to different image angles, and will be able to make better predictions based off of the original and augmented data. In context with our citrus disease dataset, the diseases have defining traits but can come in countless variations and present differing physical symptoms. By creating images from the existing ones, we will be able create different variations of the disease and train our models while also prevent overfitting.
datagen = preprocessing.image.ImageDataGenerator(
rotation_range = 20,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip = True,
vertical_flip = False,
)
# Fits the data to the generator.
datagen.fit(x_train)
We one-hot encode the target labels of the training and test dataset by using the to_categorical function from keras.utils.
One-hot encoding is necessary for the model to output probabilities for each class during training and evaluation.
# One-hot encodes the inputs
y_train_ohe = utils.to_categorical(y_train, n_classes)
y_test_ohe = utils.to_categorical(y_test, n_classes)
The code takes the training and testing datasets (previously processed into numpy arrays) and expands their dimensions to accommodate a new format.
Specifically, it reshapes the arrays to have a sample size of -1, and an image size of 64x64 with one channel, and then expands the dimensions of the arrays to include the channel dimension, resulting in a final shape of (samples, image_rows, image_cols, image_channels).
for tmp in datagen.flow(x_train, y_train_ohe, batch_size=1):
plt.imshow(cv2.resize(tmp[0].squeeze(), (64, 64)), cmap='gray')
plt.title("Generated Orange")
plt.axis("off")
break
Below, we create a flow iterator using the image data generator object previously defined to generate augmented images in batches of size one from the training set. We then plot one of the augmented images using matplotlib.
Specifically, we plot 25 sample images from the training set for visualization purposes.
def plot_confusion_matrix(
cm,
target_names,
title="Confusion matrix",
cmap=None,
normalize=True,
class_results: dict = {},
):
"""
Given a sklearn confusion matrix (cm), make a nice plot
Arguments
---------
cm: confusion matrix from sklearn.metrics.confusion_matrix
target_names: given classification classes such as [0, 1, 2]
the class names, for example: ['high', 'medium', 'low']
title: the text to display at the top of the matrix
cmap: the gradient of the values displayed from matplotlib.pyplot.cm
see http://matplotlib.org/examples/color/colormaps_reference.html
plt.get_cmap('jet') or plt.cm.Blues
normalize: If False, plot the raw numbers
If True, plot the proportions
Usage
-----
plot_confusion_matrix(cm = cm, # confusion matrix created by
# sklearn.metrics.confusion_matrix
normalize = True, # show proportions
target_names = y_labels_vals, # list of names of the classes
title = best_estimator_name) # title of graph
Citiation
---------
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
"""
accuracy = np.trace(cm) / float(np.sum(cm))
misclass = 1 - accuracy
if cmap is None:
cmap = plt.get_cmap("Blues")
plt.figure(figsize=(8, 6))
plt.imshow(cm, interpolation="nearest", cmap=cmap)
plt.title(title)
plt.colorbar()
if target_names is not None:
tick_marks = np.arange(len(target_names))
plt.xticks(tick_marks, target_names, rotation=45)
plt.yticks(tick_marks, target_names)
if normalize:
cm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
thresh = cm.max() / 1.5 if normalize else cm.max() / 2
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
if normalize:
plt.text(
j,
i,
"{:0.4f}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black",
)
else:
plt.text(
j,
i,
"{:,}".format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black",
)
plt.tight_layout()
plt.ylabel("True label")
if class_results:
x_lab = "Predicted label\n\n"
x_lab += f"accuracy={accuracy:0.4f}\nmisclass={misclass:0.4f}\n"
for key, value in class_results.items():
x_lab += f"{key}={value:0.4f}\n"
else:
x_lab = "Predicted label\n\n"
x_lab += f"accuracy={accuracy:0.4f}\nmisclass={misclass:0.4f}\n"
plt.xlabel(x_lab)
plt.show()
Below, we define a helper function named compare_mlp_cnn which takes the following inputs:
The function compares the performance of the CNN and MLP models on the test data by predicting the class labels and computing accuracy scores and confusion matrices. The results are visualized using a heatmap from the seaborn library.
def compare_mlp_cnn(model_1, model_2, X_test, y_test, title_1: str, title_2: str, labels='auto'):
plt.figure(figsize=(15, 5))
if model_1 is not None:
yhat_model_1 = np.argmax(model_1.predict(X_test), axis=1)
acc_model_1 = metrics.accuracy_score(y_test, yhat_model_1)
plt.subplot(1, 2, 1)
cm = metrics.confusion_matrix(y_test, yhat_model_1)
cm = cm/np.sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm, annot=True, fmt='.2%', xticklabels=labels, yticklabels=labels)
plt.title(f"{title_1}: {acc_model_1:.3f}")
if model_2 is not None:
yhat_model_2 = np.argmax(model_2.predict(X_test), axis=1)
acc_model_2 = metrics.accuracy_score(y_test, yhat_model_2)
plt.subplot(1, 2, 2)
cm = metrics.confusion_matrix(y_test,yhat_model_2)
cm = cm/np.sum(cm,axis=1)[:,np.newaxis]
sns.heatmap(cm,annot=True, fmt='.2%', xticklabels=labels, yticklabels=labels)
plt.title(f"{title_2}: {acc_model_2:.3f}")
Below, we define a helper function named plot_history that takes the history object as an argument, which is a dictionary containing information about the training process of a model. The function creates a plot of the loss values of the training and validation sets over epochs using the matplotlib library. The plot shows the training loss in blue and the validation loss in orange. The x-axis represents the number of epochs, and the y-axis represents the loss value. The plot is then displayed using plt.show().
This function is useful for evaluating the performance of a model during training and identifying overfitting or underfitting.
def plot_history(history):
# plot history of the model:
fig = plt.figure(figsize=(10, 8))
plt.plot(history.history["loss"], label="train")
plt.plot(history.history["val_loss"], label="test")
plt.legend()
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Loss over epochs")
The helper function summarize_net takes in a neural network model (net), test set features (x_test), and test set labels (y_test), along with an optional string parameter for the title of the plot.
It then uses the predict method of the model to predict labels for the test set and calculates the accuracy score between the predicted labels and actual labels. It also calculates the confusion matrix, normalizes it and plots it as a heatmap using the Seaborn library. The title of the plot includes the accuracy score formatted to four decimal places and the optional title text passed to the function.
def summarize_net(net, x_test, y_test, title_text=""):
label = ["citrus canker","black spot"]
plt.figure(figsize=(15, 5))
yhat = np.argmax(net.predict(x_test), axis=1)
acc = metrics.accuracy_score(y_test, yhat)
cm = metrics.confusion_matrix(y_test, yhat)
cm = cm / np.sum(cm, axis=1)[:, np.newaxis]
sns.heatmap(cm, annot=True, fmt=".2%", xticklabels=label, yticklabels=label)
plt.title(title_text + "{:.4f}".format(acc))
The following code defines a 3-layer perceptron (MLP) model using Keras. The input images are flattened and then fed into the MLP. The first and second hidden layers have 30 and 15 units, respectively, with ReLU activation functions. The output layer has as many units as the number of classes and uses a softmax activation function. The model is compiled with the RMSprop optimizer and mean squared error loss function. It also includes Recall and AUC metrics for evaluation. The model is trained on the training data (x_train and y_train_ohe) with a batch size of 50 and 250 epochs, and the training history is saved in model_history. Lastly, the model summary is printed.
# Makes a 3-layer Keras MLP
mlp = models.Sequential()
# Makes the images flat for the MLP input
mlp.add(layers.Flatten())
mlp.add(layers.Dense(input_dim=1, units=30, activation="relu"))
mlp.add(layers.Dense(units=15, activation="relu"))
mlp.add(layers.Dense(n_classes))
mlp.add(layers.Activation("softmax"))
# Compiles the model
mlp.compile(
optimizer="Adamax",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to the training data
mlp_history = mlp.fit(
x_train,
y_train_ohe,
batch_size=50,
epochs=250,
verbose=1,
validation_data=(x_test, y_test_ohe),
)
clear_screen()
# Prints model summary
mlp.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 30) 122910
dense_1 (Dense) (None, 15) 465
dense_2 (Dense) (None, 2) 32
activation (Activation) (None, 2) 0
=================================================================
Total params: 123,407
Trainable params: 123,407
Non-trainable params: 0
_________________________________________________________________
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 4096) 0
dense (Dense) (None, 30) 122910
dense_1 (Dense) (None, 15) 465
dense_2 (Dense) (None, 2) 32
activation (Activation) (None, 2) 0
=================================================================
Total params: 123,407
Trainable params: 123,407
Non-trainable params: 0
_________________________________________________________________
The code below uses the plot_model function from the utils module to plot a visualization of the model created in the previous code block.
The plot_model function takes several arguments:
The output of the plot_model function is a visualization of the MLP model in a PNG image file. This can be useful for visualizing the structure of the model, and for communicating the model to others.
# Plots the graph
utils.plot_model(
mlp,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
rankdir="LR",
expand_nested=False,
dpi=96,
)
The code below evaluates the performance of an MLP model on a test set by predicting the class probabilities and rounding them to obtain class predictions. It then prints the classification report which includes several performance metrics such as precision, recall, f1-score and support for each class as well as their macro and weighted averages.
Below, the classification report shows that the model has a high level of accuracy with a precision and recall of 0.95 for both classes. The f1-score is also high at 0.95 for both classes. The support column displays the number of samples in each class. In this case, the two classes have nearly equal support, with 206 samples of class 0 and 201 samples of class 1. The micro avg, macro avg, weighted avg and samples avg correspond to different averaging methods used to calculate the performance metrics across all classes.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = mlp.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict))
13/13 [==============================] - 0s 624us/step
13/13 [==============================] - 0s 624us/step
precision recall f1-score support
0 0.98 0.98 0.98 206
1 0.98 0.98 0.98 201
micro avg 0.98 0.98 0.98 407
macro avg 0.98 0.98 0.98 407
weighted avg 0.98 0.98 0.98 407
samples avg 0.98 0.98 0.98 407
Note: the graph starts the number of epochs at 0 instead of 1 (epoch 0 is actually epoch 1, epoch 1 is actually epoch 2, epoch 14 is actually epoch 15). For the sake of clarity, I will refer to the graph's numering notation of epochs when referenced in the analysis below.
Observing the performance of the MLP with the dataset, the curve appears to oscillate up and down (to higher and lower costs). From the Loss graph below we see the cost start at a little less than 0.250 and over time steadily decreases to about 0.05.
The oscillations in the Loss curve could be due to the model overfitting the training data. This means that the model may be learning the noise and idiosyncrasies of the training data, rather than the underlying patterns and relationships. As a result, the model may perform well on the training set but poorly on the testing set.
Another possible reason for the oscillations could be due to the learning rate used during training. If the learning rate is too high, the model may overshoot the optimal weights and biases and then struggle to converge. If the learning rate is too low, the model may take too long to converge and get stuck in local minima.
Finally, the oscillations could be due to the size and complexity of the model. If the model is too large and complex for the dataset, it may struggle to generalize to new data and instead memorize the training set. This can lead to overfitting and poor performance on the testing set.
# Variables for determining the loss over epochs
epochs = mlp_history.epoch
loss = mlp_history.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss)
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the MLP model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.95 indicating that the model has a good ability to distinguish between positive and negative cases. The ROC curve visualizes this by showing a steep rise in the true positive rate as the false positive rate is kept low, indicating that the model is able to identify positive cases while keeping false positives to a minimum. Overall, this suggests that the MLP model is performing well on the test set and is able to accurately classify the data.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below plots the training versus testing graph for the MLP model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the MLP model is improving over time, and that it is performing well on both the training and validation sets.
We do observe the validation loss' curve to be much more sporadic than the training curve. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(mlp_history.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.tight_layout()
plt.show()
The code below generates a confusion matrix and plots it using the plot_confusion_matrix function. The confusion matrix shows the number of true positive, true negative, false positive, and false negative predictions for each class. In this case, the matrix shows the number of correctly and incorrectly classified images for Citrus Canker and Black Spot diseases.
From the confusion matrix, we can see that the MLP predicted Citrus Canker correctly 196 times and black spot correctly 192 times. However, it incorrectly predicted Citrus Canker 10 times when the true class was black spot, and black spot 9 times when the true class was Citrus Canker. Overall, the MLP model has a good performance in classifying the two classes, but there is a little room for improvement in reducing the number of false positives and false negatives.
Additionally, we calculate scores such as accuracy, misclassification, precision, recall, f1-score, and support.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .9533 or roughly 95% accuracy.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.0467 - less than 5%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .9533 or 95%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .9533 or 95%.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9533 or roughly 95%.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
)
The code creates a convolutional neural network (CNN) named cnn1 with two convolutional layers, each followed by ReLU activation and max-pooling. A dense layer with 100 units and ReLU activation is added on the flattened output of the convolutional layers, followed by another dense layer with 100 units and ReLU activation, and a final dense layer with softmax activation, outputting the predicted probabilities for each class. The model is compiled with mean squared error as the loss function and RMSprop as the optimizer. The model is then trained on the training data with a batch size of 50, for 150 epochs, with validation data provided for monitoring the performance during training.
# Creates a CNN with convolution layer and max pooling
cnn1 = models.Sequential()
cnn1.add(
layers.Conv2D(
filters=16,
kernel_size=(3, 3),
padding="same",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn1.add(Activation("relu"))
cnn1.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
cnn1.add(Activation("relu"))
cnn1.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
# Adds 1 layer on flattened output
cnn1.add(Flatten())
cnn1.add(Dense(100, activation="relu"))
cnn1.add(Dense(50, activation="relu"))
cnn1.add(Dense(n_classes, activation="softmax"))
# Compiles the model
cnn1.compile(
optimizer="rmsprop",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to training data
history1 = cnn1.fit(
x_train,
y_train_ohe,
batch_size=50,
epochs=150,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
)
clear_screen()
# Prints model summary
cnn1.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 64, 64, 16) 160
activation_1 (Activation) (None, 64, 64, 16) 0
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
activation_2 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 64, 64, 16) 160
activation_1 (Activation) (None, 64, 64, 16) 0
max_pooling2d (MaxPooling2D (None, 32, 32, 16) 0
)
activation_2 (Activation) (None, 32, 32, 16) 0
max_pooling2d_1 (MaxPooling (None, 16, 16, 16) 0
2D)
flatten_1 (Flatten) (None, 4096) 0
dense_3 (Dense) (None, 100) 409700
dense_4 (Dense) (None, 50) 5050
dense_5 (Dense) (None, 2) 102
=================================================================
Total params: 415,012
Trainable params: 415,012
Non-trainable params: 0
_________________________________________________________________
Data augmentation is a technique used to artificially increase the size of a training dataset by creating new, slightly modified versions of existing images. This can help the model learn more robust features and generalize better to new, unseen images.
Below, we use a data generator to feed the CNN model cnn1 with augmented data during the training process. Specifically, the fit_generator function takes in an instance of ImageDataGenerator and returns a history object that contains the training and validation loss and accuracy over the epochs. In this case, the generator datagen takes in the training data and applies random transformations to create new training images. The flow method returns an iterator that provides a batch of images and labels on each iteration. The batch_size parameter specifies the number of samples per gradient update. The verbose parameter controls the verbosity of the output during training.
Early stopping is a regularization technique used in machine learning to prevent overfitting by stopping training when the performance on the validation set stops improving.
# Creates a CNN with convolution layer and max pooling for use with
# flow generator.
cnn1_flow = models.Sequential()
cnn1_flow.add(
layers.Conv2D(
filters=16,
kernel_size=(3, 3),
padding="same",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn1_flow.add(Activation("relu"))
cnn1_flow.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
cnn1_flow.add(Activation("relu"))
cnn1_flow.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
# Adds 1 layer on flattened output
cnn1_flow.add(Flatten())
cnn1_flow.add(Dense(100, activation="relu"))
cnn1_flow.add(Dense(50, activation="relu"))
cnn1_flow.add(Dense(n_classes, activation="softmax"))
# Compiles the model
cnn1_flow.compile(
optimizer="rmsprop",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to training data using flow generator
history1_1 = cnn1_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=128),
steps_per_epoch=len(x_train) // 128,
epochs=150,
shuffle=True,
verbose=1,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=10, start_from_epoch=100
)
],
)
clear_screen()
# Prints model summary
cnn1_flow.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 64, 64, 16) 160
activation_3 (Activation) (None, 64, 64, 16) 0
max_pooling2d_2 (MaxPooling (None, 32, 32, 16) 0
2D)
activation_4 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 64, 64, 16) 160
activation_3 (Activation) (None, 64, 64, 16) 0
max_pooling2d_2 (MaxPooling (None, 32, 32, 16) 0
2D)
activation_4 (Activation) (None, 32, 32, 16) 0
max_pooling2d_3 (MaxPooling (None, 16, 16, 16) 0
2D)
flatten_2 (Flatten) (None, 4096) 0
dense_6 (Dense) (None, 100) 409700
dense_7 (Dense) (None, 50) 5050
dense_8 (Dense) (None, 2) 102
=================================================================
Total params: 415,012
Trainable params: 415,012
Non-trainable params: 0
_________________________________________________________________
The code below uses the plot_model function from the utils module to plot a visualization of the model created in the previous code block.
The plot_model function takes several arguments:
The output of the plot_model function is a visualization of the CNN model in a PNG image file. This can be useful for visualizing the structure of the model, and for communicating the model to others.
# Plots the graph
utils.plot_model(
cnn1,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
rankdir="LR",
expand_nested=False,
dpi=96,
)
Below we define two arrays, y_predict_proba and y_predict, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained cnn1 model. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = cnn1.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict))
13/13 [==============================] - 0s 828us/step
13/13 [==============================] - 0s 828us/step
precision recall f1-score support
0 1.00 1.00 1.00 206
1 1.00 1.00 1.00 201
micro avg 1.00 1.00 1.00 407
macro avg 1.00 1.00 1.00 407
weighted avg 1.00 1.00 1.00 407
samples avg 1.00 1.00 1.00 407
Below we again define two arrays, y_predict_proba_flow and y_predict_flow, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained CNN model with Flow Generation and Early Stopping. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict_flow.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba_flow = cnn1_flow.predict(x_test)
y_predict_flow = np.round(y_predict_proba_flow)
# Prints classification report
print(classification_report(y_test_ohe, y_predict_flow))
13/13 [==============================] - 0s 957us/step
precision recall f1-score support
0 0.88 0.74 0.81 206
1 0.77 0.90 0.83 201
micro avg 0.82 0.82 0.82 407
macro avg 0.83 0.82 0.82 407
weighted avg 0.83 0.82 0.82 407
samples avg 0.82 0.82 0.82 407
Note: the graph starts the number of epochs at 0 instead of 1 (epoch 0 is actually epoch 1, epoch 1 is actually epoch 2, epoch 14 is actually epoch 15). For the sake of clarity, I will refer to the graph's numering notation of epochs when referenced in the analysis below.
Observing the performance of CNN 1 with the dataset, the curve is mostly stable and the loss function output decreases as it goes through the epochs, starting a loss value of 0.25 and ending with a loss value around 0.02.
With the Flow Generator with Early Stopping , the curve decreases on average, but it looks slightly jagged as the loss function output increases and decreases throughout the epochs. The curve starts with a loss value of 0.25, decreasing but not as steadily as the CNN 1 curve. The curve is also much more volatile than the function representing CNN 1 performance with the training dataset, and it finishes with a higher loss value (0.15) at epoch 110.
These observations could be attributed to the affect of early stopping. Early stopping is a regularization technique that stops the training process before it overfits the data, based on certain stopping criteria. While this can help improve generalization and prevent overfitting, it can also result in a more jagged learning curve as the model adjusts to the changes in the training data. This can be especially true if the early stopping criterion is too strict or if the model architecture is not well-suited to the data. Additionally, the use of the Flow Generator may introduce additional complexity and variability into the training process, which can further impact the performance of the model.
# Variables for determining the loss over epochs
epochs = history1.epoch
loss = history1.history["loss"]
epochs_flow = history1_1.epoch
loss_flow = history1_1.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss, label="CNN")
plt.plot(epochs_flow, loss_flow, label="CNN (Flow Generator) w/ Early Stopping")
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.legend()
plt.show()
In the below confusion matrix, we will use several prediction scores to measure our model's predictions.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .9926 or roughly 99% accuracy. This is an excellent score and indicates a very well performing model.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.0074 - less than 1%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .9927 or 99%. This is slightly higher than our accuracy value which we feel is ideal for finding a viable solution for field-use as it is important to minimize false positives and maximize true positives.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .9926 or 99% - same as our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9926 or roughly 99% - same as our accuracy.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN Confusion Matrix"
)
In the below confusion matrix, we will use several prediction scores to measure our model's predictions.
From the heatmap, we observe that this model mispredicted 92 images: 79 images it predicted as citrus canker when it was black spot, and 13 images it predicted as black spot when it was actually citrus canker. This model struggled more with citrus canker predictions - actual black spot images by a significant margin.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .7740 or roughly 77% accuracy. This score is significantly lower than the previous accuracy without the Flow Generator and Early stopping - an almost 23% drop in accuracy.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.2260 or 23%. This increase follows logically from the drop in accuracy.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .8054 or 81%. This is slightly higher than our accuracy value which we feel is ideal for finding a viable solution for field-use as it is important to minimize false positives and maximize true positives. Yet this is still lower than our previous precision.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .7740 or 77% - same as our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .7674 or roughly 77%.
class_report = metrics.classification_report(
y_test_ohe, y_predict_flow, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict_flow, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN (With Flow Generator) Confusion Matrix"
)
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 1 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 1.0 - which indicates that the model has excellent ability to distinguish between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 1 - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 1 (with Flow Generator and Early Stopping) model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is .77 - which is significantly lower than the AUC for the regular CNN 1 model.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba_flow, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 1 (Flow Generator) - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The below code plots the training versus testing graph for the CNN 1 model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 1 model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
We do observe the validation loss' curve to be much more sporadic than the training curve. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history1.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 1")
plt.tight_layout()
plt.show()
The below code plots the training versus testing graph for the CNN 1 with Flow Generator and Early Stopping model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 1 model with Flow Generator is improving over time but that it is also not performing as well with regards to overall higher loss score, lower recall score, and lower AUC score when compared to the previous CNN 1 Model.
As before, we observe the validation loss' curve to be much more sporadic than the training curve. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history1_1.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 1 With Flow Generator")
plt.tight_layout()
plt.show()
In the below heat map, the x-axis represents the model's prediction while the y-axis represents the correct or actual value.
In comparing the CNN and MLP heat maps, we see that the CNN heat map gets a 100% correct prediction on citrus cankers - whereas the MLP correctly predicted citrus cankers 96.60% of the time.
We also see that MLP correctly predicted black spot 98.01% of the time - as compared to CNN which predicted black spot correctly 98.51% of the time. This means that CNN was about 4% more accurate overall in its predictions of citrus cankers and 50% more accurate in predicting blackspots than MLP.
Overall, we see excellent performance and accuracy from both CNN 1 and MLP - however, CNN 1 certainly seems to perform better than MLP based on the heatmaps.
One reason for the differences in performance could be the architectural differences between CNN and MLP. Convolutional Neural Networks are designed to capture spatial relationships in the data by applying filters to identify patterns and features. On the other hand, Multi-Layer Perceptrons are a type of feedforward neural network that can learn non-linear relationships between input and output data.
Thus, CNN may perform better when dealing with image data, as it can better capture the spatial relationships between pixels. Meanwhile, MLP may perform better with data that is more structured and can be better represented by a set of features or attributes. Therefore, depending on the nature of the data and the problem at hand, the choice of architecture can have a significant impact on the performance and accuracy of the model.
compare_mlp_cnn(
cnn1,
mlp,
x_test,
y_test,
title_1="CNN 1",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 1ms/step 13/13 [==============================] - 0s 542us/step
In comparing the CNN 1 (w/ Flow Generator and Early Stopping) and MLP heat maps, we see that the CNN heat map gets a 93.69% correct prediction on citrus cankers - whereas the MLP correctly predicted citrus cankers 95.15% of the time. We also see that MLP correctly predicted black spot 95.52% of the time - as compared to CNN 1 (Flow Generator) which predicted black spot correctly 60.70% of the time. This means that CNN was about 2% less accurate overall in its predictions of citrus cankers and .25% less accurate in predicting blackspots than MLP.
Overall, we see poorer performance and accuracy from CNN 1 w/ Flow Generation and Early Stopping when compared to MLP and our standard CNN 1.
This may be due to several reasons. One possibility is that the CNN architecture and hyperparameters used were not optimized for the given dataset, while the MLP was able to better fit the data. Additionally, the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn1_flow,
mlp,
x_test,
y_test,
title_1="CNN 1 (Flow Generator)",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 1ms/step 13/13 [==============================] - 0s 951us/step
In comparing the CNN 1 (w/ Flow Generator and Early Stopping) and CNN heat maps, we see that the CNN heat map gets a 93.69% correct prediction on citrus cankers - whereas the CNN correctly predicted citrus cankers 100% of the time. We also see that CNN correctly predicted black spot 100% of the time - as compared to CNN 1 (Flow Generator) which predicted black spot correctly 60.70% of the time. This means that CNN was about 6% less accurate overall in its predictions of citrus cankers and 40% less accurate in predicting blackspots than MLP.
Overall, we see much poorer performance and accuracy from CNN 1 w/ Flow Generation and Early Stopping when compared to our standard CNN 1.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn1,
cnn1_flow,
x_test,
y_test,
title_1="CNN 1",
title_2="CNN 1 (Flow Generator)",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 1ms/step 13/13 [==============================] - 0s 985us/step
Below, we define a Convolutional Neural Network (CNN) model named cnn2. It is similar to cnn1 in that it uses two convolutional layers, max pooling, flattening, and dense layers, but has a few additional modifications to improve its performance. These modifications include adding dropout layers for regularization, increasing the number of filters in the convolutional layers, using the categorical cross-entropy loss function, and using the Recall metric for evaluation. Additionally, it uses a size of 50 and a 150 epochs.
# Creates a CNN with convolution layer and max pooling
cnn2 = models.Sequential()
cnn2.add(
layers.Conv2D(
filters=32,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn2.add(
layers.Conv2D(
filters=64,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn2.add(Activation("relu"))
cnn2.add(
MaxPooling2D(
pool_size=(2, 2),
data_format="channels_last"
)
)
cnn2.add(Activation("relu"))
cnn2.add(
MaxPooling2D(
pool_size=(2, 2),
data_format="channels_last"
)
)
# Adds dropout layer
cnn2.add(Dropout(0.2))
# Adds 1 layer on flattened output
cnn2.add(Flatten())
cnn2.add(Dense(512, activation="relu"))
cnn2.add(Dense(256, activation="relu"))
cnn2.add(Dense(128, activation="relu"))
# Adds dropout layer
cnn2.add(Dropout(0.4))
cnn2.add(Dense(n_classes, activation="sigmoid"))
# Compiles the model
cnn2.compile(
optimizer="Adamax",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to training data
history2 = cnn2.fit(
x_train,
y_train_ohe,
batch_size=50,
epochs=150,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=50
)
],
)
clear_screen()
# Prints model summary
cnn2.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 64, 64, 32) 320
conv2d_3 (Conv2D) (None, 64, 64, 64) 18496
activation_5 (Activation) (None, 64, 64, 64) 0
max_pooling2d_4 (MaxPooling (None, 32, 32, 64) 0
2D)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_2 (Conv2D) (None, 64, 64, 32) 320
conv2d_3 (Conv2D) (None, 64, 64, 64) 18496
activation_5 (Activation) (None, 64, 64, 64) 0
max_pooling2d_4 (MaxPooling (None, 32, 32, 64) 0
2D)
activation_6 (Activation) (None, 32, 32, 64) 0
max_pooling2d_5 (MaxPooling (None, 16, 16, 64) 0
2D)
dropout (Dropout) (None, 16, 16, 64) 0
flatten_3 (Flatten) (None, 16384) 0
dense_9 (Dense) (None, 512) 8389120
dense_10 (Dense) (None, 256) 131328
dense_11 (Dense) (None, 128) 32896
dropout_1 (Dropout) (None, 128) 0
dense_12 (Dense) (None, 2) 258
=================================================================
Total params: 8,572,418
Trainable params: 8,572,418
Non-trainable params: 0
_________________________________________________________________
Data augmentation is a technique used to artificially increase the size of a training dataset by creating new, slightly modified versions of existing images. This can help the model learn more robust features and generalize better to new, unseen images.
Below, we use a data generator to feed the CNN model cnn2 with augmented data during the training process. Specifically, the fit_generator function takes in an instance of ImageDataGenerator and returns a history object that contains the training and validation loss and accuracy over the epochs. In this case, the generator datagen takes in the training data and applies random transformations to create new training images. The flow method returns an iterator that provides a batch of images and labels on each iteration. The batch_size parameter specifies the number of samples per gradient update. The verbose parameter controls the verbosity of the output during training.
Early stopping is a regularization technique used in machine learning to prevent overfitting by stopping training when the performance on the validation set stops improving.
# Creates a CNN with convolution layer and max pooling for use with
# flow generator.
cnn2_flow = models.Sequential()
cnn2_flow.add(
layers.Conv2D(
filters=32,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn2_flow.add(
layers.Conv2D(
filters=64,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)
)
cnn2_flow.add(Activation("relu"))
cnn2_flow.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn2_flow.add(Activation("relu"))
cnn2_flow.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
# Adds dropout layer
cnn2_flow.add(Dropout(0.2))
# Adds 1 layer on flattened output
cnn2_flow.add(Flatten())
# cnn2_flow.add(Dense(1024, activation="relu"))
cnn2_flow.add(Dense(512, activation="relu"))
cnn2_flow.add(Dense(256, activation="relu"))
cnn2_flow.add(Dense(128, activation="relu"))
# Adds dropout layer
cnn2_flow.add(layers.Dropout(0.4))
cnn2_flow.add(Dense(n_classes, activation="sigmoid"))
# Compiles the model
cnn2_flow.compile(
optimizer="Adamax",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to training data using flow generator
history2_2 = cnn2_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=50),
steps_per_epoch=len(x_train) // 128,
epochs=150,
shuffle=True,
verbose=1,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss",
patience=3,
start_from_epoch=50,
restore_best_weights=True,
)
],
)
clear_screen()
# Prints model summary
cnn2_flow.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 64, 64, 32) 320
conv2d_5 (Conv2D) (None, 64, 64, 64) 18496
activation_7 (Activation) (None, 64, 64, 64) 0
max_pooling2d_6 (MaxPooling (None, 32, 32, 64) 0
2D)
activation_8 (Activation) (None, 32, 32, 64) 0
max_pooling2d_7 (MaxPooling (None, 16, 16, 64) 0
2D)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_4 (Conv2D) (None, 64, 64, 32) 320
conv2d_5 (Conv2D) (None, 64, 64, 64) 18496
activation_7 (Activation) (None, 64, 64, 64) 0
max_pooling2d_6 (MaxPooling (None, 32, 32, 64) 0
2D)
activation_8 (Activation) (None, 32, 32, 64) 0
max_pooling2d_7 (MaxPooling (None, 16, 16, 64) 0
2D)
dropout_2 (Dropout) (None, 16, 16, 64) 0
flatten_4 (Flatten) (None, 16384) 0
dense_13 (Dense) (None, 512) 8389120
dense_14 (Dense) (None, 256) 131328
dense_15 (Dense) (None, 128) 32896
dropout_3 (Dropout) (None, 128) 0
dense_16 (Dense) (None, 2) 258
=================================================================
Total params: 8,572,418
Trainable params: 8,572,418
Non-trainable params: 0
_________________________________________________________________
The code below uses the plot_model function from the utils module to plot a visualization of the model created in the previous code block.
The plot_model function takes several arguments:
The output of the plot_model function is a visualization of the CNN model in a PNG image file. This can be useful for visualizing the structure of the model, and for communicating the model to others.
# Plots the graph
utils.plot_model(
cnn2,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
rankdir="LR",
expand_nested=False,
dpi=96,
)
Below we define two arrays, y_predict_proba and y_predict, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained cnn2 model. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = cnn2.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict))
13/13 [==============================] - 0s 2ms/step
precision recall f1-score support
0 1.00 1.00 1.00 206
1 1.00 1.00 1.00 201
micro avg 1.00 1.00 1.00 407
macro avg 1.00 1.00 1.00 407
weighted avg 1.00 1.00 1.00 407
samples avg 1.00 1.00 1.00 407
Below we again define two arrays, y_predict_proba_flow and y_predict_flow, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained CNN model with Flow Generation and Early Stopping. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict_flow.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba_flow = cnn2_flow.predict(x_test)
y_predict_flow = np.round(y_predict_proba_flow)
# Prints classification report
print(classification_report(y_test_ohe, y_predict_flow, zero_division=0))
13/13 [==============================] - 0s 2ms/step
precision recall f1-score support
0 0.97 0.82 0.89 206
1 0.84 0.98 0.90 201
micro avg 0.90 0.89 0.90 407
macro avg 0.91 0.90 0.89 407
weighted avg 0.91 0.89 0.89 407
samples avg 0.89 0.89 0.89 407
Note: the graph starts the number of epochs at 0 instead of 1 (epoch 0 is actually epoch 1, epoch 1 is actually epoch 2, epoch 14 is actually epoch 15). For the sake of clarity, I will refer to the graph's numering notation of epochs when referenced in the analysis below.
Observing the performance of CNN 2 with the dataset, the curve is mostly stable and the loss function output decreases as it goes through the epochs, starting a loss value of 0.25 and ending with a loss value near 0.
With the Flow Generator CNN, the curve decreases on average, but not to the extent that our non-Flow Generator CNN did. The curve starts with a loss value of 0.25, decreasing but not as steadily as the CNN 2 curve. The curve is also much more volatile than the function representing CNN 2 performance with the training dataset, and it finishes with a higher loss value (0.10) than the other curve.
This may be because the Flow Generator introduces more noise and variability in the dataset, leading to a less stable learning process for the CNN. Additionally, the use of data augmentation techniques in the Flow Generator may have altered the characteristics of the images in ways that were not beneficial for the model's ability to learn and generalize to new data. These factors may have contributed to the slower and less effective learning process of the CNN with the Flow Generator compared to the CNN without it.
# Variables for determining the loss over epochs
epochs = history2.epoch
loss = history2.history["loss"]
epochs_flow = history2_2.epoch
loss_flow = history2_2.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss, label="CNN w/ Early Stopping")
plt.plot(epochs_flow, loss_flow, label="CNN (Flow Generator) w/ Early Stopping")
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.legend()
plt.show()
For this run, our initial accuracy score is 1 or 100% accuracy. This is a perfect score and indicates a very well performing and rounded model.
We can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.00 - or 0%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. In our case, the precision score is 1 or 100%. This is slightly higher than our accuracy value which we feel is ideal for finding a viable solution for field-use as it is important to minimize false positives and maximize true positives.
Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was 1 or 100% - same as our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9926 or roughly 99% - same as our accuracy.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN Confusion Matrix"
)
In the below confusion matrix, we will use several prediction scores to measure our CNN 2 with Flow Generator model's predictions. Overall it is a good model, but does perform worse than the CNN model without the Flow Generator technique.
For this run, our initial accuracy score is .8722 or 87% accuracy. This is an excellent score and indicates a very well performing model, but it 13% less accurate than the previous model.
We can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.1278 - or 13%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. In our case, the precision score is 0.8928 or 89%. This is slightly higher than our accuracy value which we feel is ideal for finding a viable solution for field-use as it is important to minimize false positives and maximize true positives.
Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .8722 or 87%.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .8697 or roughly 87%.
class_report = metrics.classification_report(
y_test_ohe, y_predict_flow, output_dict=True, zero_division=0
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict_flow, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN (With Flow Generator) Confusion Matrix"
)
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 2 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 1.0 - which indicates that the model has excellent ability to distinguish between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 2 - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 2 (with Flow Generator) model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is .87 - which is lower than the AUC for the regular CNN 2 model.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba_flow, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 2 (Flow Generator) - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The below code plots the training versus testing graph for the CNN 2 model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 2 model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
We do observe the validation loss' curve to be slightly more sporadic than the training curve. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history2.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 2")
plt.tight_layout()
plt.show()
The below code plots the training versus testing graph for the CNN 2 with Flow Generator model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, except we do note particularly that the validation loss does oscillate and it shoots up drastically near the final epochs. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 2 model with Flow Generator is improving over time but we do notice lower Recall and lower AUC scores when compared to the previous CNN 2 Model.
As before, we observe the curve of this model to be much more sporadic than the previous curve. This may be due to the fact that the Flow Generator is introducing additional noise or variability in the data, making it harder for the model to learn and generalize well.
# Model history values
hist_values = list(history2_2.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 2 With Flow Generator")
plt.tight_layout()
plt.show()
In the below heat map, the x-axis represents the model's prediction while the y-axis represents the correct or actual value.
In comparing the CNN 2 and MLP heat maps, we see that the CNN 2 heat map gets a 100% correct prediction on citrus cankers - whereas the MLP correctly predicted citrus cankers 95.15% of the time.
We also see that MLP correctly predicted black spot 95.52% of the time - as compared to CNN 2 which predicted black spot correctly 100% of the time. This means that CNN 2 was about 5% more accurate overall in its predictions of citrus cankers and 6% more accurate in predicting blackspots than MLP.
Overall, we see excellent performance and accuracy from both CNN 2 and MLP - however, CNN 2 certainly seems to perform markedly better than MLP based on the heatmaps.
One reason for the differences in performance could be the architectural differences between CNN 2 and MLP. Convolutional Neural Networks are designed to capture spatial relationships in the data by applying filters to identify patterns and features. On the other hand, Multi-Layer Perceptrons are a type of feedforward neural network that can learn non-linear relationships between input and output data.
Thus, CNN 2 may perform better when dealing with image data, as it can better capture the spatial relationships between pixels. Meanwhile, MLP may perform better with data that is more structured and can be better represented by a set of features or attributes. Therefore, depending on the nature of the data and the problem at hand, the choice of architecture can have a significant impact on the performance and accuracy of the model.
compare_mlp_cnn(
cnn2,
mlp,
x_test,
y_test,
title_1="CNN 2",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 916us/step
In comparing the CNN 2 (w/ Flow Generator) and MLP heat maps, we see that the CNN heat map gets a 75.24% correct prediction on citrus cankers - whereas the MLP correctly predicted citrus cankers 95.15% of the time. We also see that MLP correctly predicted black spot 95.52% of the time - as compared to CNN 2 (Flow Generator) which predicted black spot correctly 99% of the time. This means that CNN was about 20% less accurate overall in its predictions of citrus cankers but 4% more accurate in predicting blackspots than MLP.
Overall, we see poorer performance In predicting citrus cankers but an increase in predicting black spots for CNN 2 (w/ Flow Generator).
This may be due to several reasons. One reason could be the difference in the architecture of the models. CNNs are generally better suited for image data, which is the type of data being used in this case, but MLPs can also perform well with image data if properly designed. Another reason could be the difference in the number of parameters between the two models. MLPs typically have more parameters than CNNs, which can make them more flexible in modeling complex relationships in the data. However, this can also lead to overfitting if not properly regularized. It's also possible that the difference in performance could be due to differences in the training process, such as the choice of optimizer, learning rate, or batch size.
compare_mlp_cnn(
cnn2_flow,
mlp,
x_test,
y_test,
title_1="CNN 2 (Flow Generator)",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 958us/step
In comparing the CNN 2 (w/ Flow Generator) and standard CNN2 heat maps, we see that the CNN 2 heat map gets a 100% correct prediction on citrus cankers - whereas the CNN 2 (w/ Flow Generator) correctly predicted citrus cankers 75.24% of the time. We also see that CNN 2 correctly predicted black spot 100% of the time - as compared to CNN 2 (Flow Generator) which predicted black spot correctly 99% of the time. This means that CNN was about 25% less accurate overall in its predictions of citrus cankers and 1% less accurate in predicting blackspots than MLP.
Overall, we see poorer performance and accuracy from CNN 2 w/ Flow Generation but primarily just when predicting citrus cankers.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn2,
cnn2_flow,
x_test,
y_test,
title_1="CNN 2",
title_2="CNN 2 (Flow Generator)",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 2ms/step
Similar to what we have done above for cnn1 and cnn2, we create a convolutional neural network called cnn3, which has several convolutional layers with different numbers of filters, followed by max-pooling layers, a dense layer, and a softmax output layer. The network is trained using the 'categorical_crossentropy' loss function, the 'rmsprop' optimizer, and 'accuracy' as a metric. The model is trained using the fit() method and the EarlyStopping callback is used to stop training if the validation loss doesn't improve after two epochs.
The main difference between cnn3 and cnn2 is the number and arrangement of convolutional layers, as well as the use of regularization in cnn3 through the kernel_regularizer argument, which adds L2 regularization to the kernel weights. Additionally, cnn3 uses Glorot/Bengio initialization for the dense and softmax layers, and He uniform initialization for the ReLU layers.
# Use Kaiming He to regularize ReLU layers: https://arxiv.org/pdf/1502.01852.pdf
# Use Glorot/Bengio for linear/sigmoid/softmax: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
l2_lambda = 1e-4
cnn3 = Sequential()
cnn3.add(
Conv2D(
filters=32,
input_shape=(img_width, img_height, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3.add(
Conv2D(
filters=32,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn3.add(
Conv2D(
filters=64,
input_shape=(img_width, img_width, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3.add(
Conv2D(
filters=64,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
)
)
cnn3.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn3.add(
Conv2D(
filters=128,
input_shape=(img_width, img_width, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
) # more compact syntax
cnn3.add(
Conv2D(
filters=128,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
# add one layer on flattened output
cnn3.add(Flatten())
cnn3.add(Dropout(0.25)) # add some dropout for regularization after conv layers
cnn3.add(
Dense(
128,
activation="relu",
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
)
)
cnn3.add(Dropout(0.5)) # add some dropout for regularization, again!
cnn3.add(
Dense(
n_classes,
activation="sigmoid",
kernel_initializer="glorot_uniform",
kernel_regularizer=l2(l2_lambda),
)
)
# Let's train the model
cnn3.compile(
optimizer="Adamax",
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
history3 = cnn3.fit(
x_train,
y_train_ohe,
steps_per_epoch=len(x_train) // 128,
epochs=150,
verbose=1,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss",
patience=3,
start_from_epoch=50,
restore_best_weights=True,
)
],
)
clear_screen()
# Prints model summary
cnn3.summary()
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 64, 64, 32) 320
conv2d_7 (Conv2D) (None, 64, 64, 32) 9248
max_pooling2d_8 (MaxPooling (None, 32, 32, 32) 0
2D)
conv2d_8 (Conv2D) (None, 32, 32, 64) 18496
conv2d_9 (Conv2D) (None, 32, 32, 64) 36928
max_pooling2d_9 (MaxPooling (None, 16, 16, 64) 0
2D)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 64, 64, 32) 320
conv2d_7 (Conv2D) (None, 64, 64, 32) 9248
max_pooling2d_8 (MaxPooling (None, 32, 32, 32) 0
2D)
conv2d_8 (Conv2D) (None, 32, 32, 64) 18496
conv2d_9 (Conv2D) (None, 32, 32, 64) 36928
max_pooling2d_9 (MaxPooling (None, 16, 16, 64) 0
2D)
conv2d_10 (Conv2D) (None, 16, 16, 128) 73856
conv2d_11 (Conv2D) (None, 16, 16, 128) 147584
flatten_5 (Flatten) (None, 32768) 0
dropout_4 (Dropout) (None, 32768) 0
dense_17 (Dense) (None, 128) 4194432
dropout_5 (Dropout) (None, 128) 0
dense_18 (Dense) (None, 2) 258
=================================================================
Total params: 4,481,122
Trainable params: 4,481,122
Non-trainable params: 0
_________________________________________________________________
# Use Kaiming He to regularize ReLU layers: https://arxiv.org/pdf/1502.01852.pdf
# Use Glorot/Bengio for linear/sigmoid/softmax: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf
l2_lambda = 1e-4
cnn3_flow = Sequential()
cnn3_flow.add(
Conv2D(
filters=32,
input_shape=(img_width, img_height, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3_flow.add(
Conv2D(
filters=32,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3_flow.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn3_flow.add(
Conv2D(
filters=64,
input_shape=(img_width, img_width, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3_flow.add(
Conv2D(
filters=64,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
)
)
cnn3_flow.add(MaxPooling2D(pool_size=(2, 2), data_format="channels_last"))
cnn3_flow.add(
Conv2D(
filters=128,
input_shape=(img_width, img_width, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
cnn3_flow.add(
Conv2D(
filters=128,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)
)
# add one layer on flattened output
cnn3_flow.add(Flatten())
cnn3_flow.add(Dropout(0.25)) # add some dropout for regularization after conv layers
cnn3_flow.add(
Dense(
128,
activation="relu",
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
)
)
cnn3_flow.add(Dropout(0.5)) # add some dropout for regularization, again!
cnn3_flow.add(
Dense(
n_classes,
activation="softmax",
kernel_initializer="glorot_uniform",
kernel_regularizer=l2(l2_lambda),
)
)
# Let's train the model
cnn3_flow.compile(
optimizer="Adamax",
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
history3_3 = cnn3_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=128),
steps_per_epoch=len(x_train) // 128,
epochs=150,
verbose=1,
shuffle=True,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss",
patience=3,
start_from_epoch=50,
restore_best_weights=True,
)
],
)
clear_screen()
# Prints model summary
cnn3.summary()
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 64, 64, 32) 320
conv2d_7 (Conv2D) (None, 64, 64, 32) 9248
max_pooling2d_8 (MaxPooling (None, 32, 32, 32) 0
2D)
conv2d_8 (Conv2D) (None, 32, 32, 64) 18496
conv2d_9 (Conv2D) (None, 32, 32, 64) 36928
max_pooling2d_9 (MaxPooling (None, 16, 16, 64) 0
2D)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_6 (Conv2D) (None, 64, 64, 32) 320
conv2d_7 (Conv2D) (None, 64, 64, 32) 9248
max_pooling2d_8 (MaxPooling (None, 32, 32, 32) 0
2D)
conv2d_8 (Conv2D) (None, 32, 32, 64) 18496
conv2d_9 (Conv2D) (None, 32, 32, 64) 36928
max_pooling2d_9 (MaxPooling (None, 16, 16, 64) 0
2D)
conv2d_10 (Conv2D) (None, 16, 16, 128) 73856
conv2d_11 (Conv2D) (None, 16, 16, 128) 147584
flatten_5 (Flatten) (None, 32768) 0
dropout_4 (Dropout) (None, 32768) 0
dense_17 (Dense) (None, 128) 4194432
dropout_5 (Dropout) (None, 128) 0
dense_18 (Dense) (None, 2) 258
=================================================================
Total params: 4,481,122
Trainable params: 4,481,122
Non-trainable params: 0
_________________________________________________________________
The code below uses the plot_model function from the utils module to plot a visualization of the model created in the previous code block.
The plot_model function takes several arguments:
The output of the plot_model function is a visualization of the CNN model in a PNG image file. This can be useful for visualizing the structure of the model, and for communicating the model to others.
# Plots the graph
utils.plot_model(
cnn3,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
rankdir="LR",
expand_nested=False,
dpi=96,
)
Below we define two arrays, y_predict_proba and y_predict, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained cnn3 model. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = cnn3.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict))
13/13 [==============================] - 0s 3ms/step
precision recall f1-score support
0 0.92 0.86 0.89 206
1 0.85 0.93 0.89 201
micro avg 0.88 0.89 0.89 407
macro avg 0.89 0.89 0.89 407
weighted avg 0.89 0.89 0.89 407
samples avg 0.89 0.89 0.89 407
/home/ocean_trader/anaconda3/envs/lab6/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
Below we again define two arrays, y_predict_proba_flow and y_predict_flow, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained CNN model with Flow Generation and Early Stopping. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict_flow.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba_flow = cnn3_flow.predict(x_test)
y_predict_flow = np.round(y_predict_proba_flow)
# Prints classification report
print(classification_report(y_test_ohe, y_predict_flow, zero_division=0))
13/13 [==============================] - 0s 3ms/step
precision recall f1-score support
0 0.73 0.89 0.80 206
1 0.85 0.66 0.74 201
micro avg 0.77 0.77 0.77 407
macro avg 0.79 0.77 0.77 407
weighted avg 0.79 0.77 0.77 407
samples avg 0.77 0.77 0.77 407
Note: the graph starts the number of epochs at 0 instead of 1 (epoch 0 is actually epoch 1, epoch 1 is actually epoch 2, epoch 14 is actually epoch 15). For the sake of clarity, I will refer to the graph's numering notation of epochs when referenced in the analysis below.
Observing the performance of CNN 2 with the dataset, the curve of the CNN with only Early Stopping is stable, with the loss function output decreasing as it goes through the epochs. It starts with a loss value of 1.6 and ending with a loss value near 0.6 as it approaches epoch #55.
With the Flow Generator CNN, the curve decreases on average as well, showing how this model outperforms the CNN with only Early Stopping model by a small margin. It starts with a loss value of 1.3 and ends with a loss value below 0.6 at epoch #60.
The use of data augmentation techniques in the Flow Generator may have altered the characteristics of the images in ways that were beneficial for the model's ability to learn and generalize to new data. These factors may have contributed to the faster and more effective learning process of the CNN with the Flow Generator compared to the CNN without it.
# Variables for determining the loss over epochs
epochs = history3.epoch
loss = history3.history["loss"]
epochs_flow = history3_3.epoch
loss_flow = history3_3.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss, label="CNN w/ Early Stopping")
plt.plot(epochs_flow, loss_flow, label="CNN (Flow Generator) w/ Early Stopping")
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.legend()
plt.show()
In this run, the model correctly predicted 174 citrus canker images and 166 black spot images. The distribution of false positives and negatives (32-35) is balanced, meaning that there is no significant bias that makes the model perform terrible towards a specific class.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .8354 or roughly 83% accuracy. This score is near the accuracy of the CNN model with the Early Stopping technique, the difference being 3% higher.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.1646 or 16%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .8342 or 83%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .8280 or 82% - near our accuracy value.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .8311 or roughly 83%.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN Confusion Matrix"
)
/home/ocean_trader/anaconda3/envs/lab6/lib/python3.10/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in samples with no predicted labels. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result))
In this run, the model correctly predicted 153 citrus canker images and 176 black spot images. The distribution of false positives and negatives (25-53) is less balanced in comparison to the previous model. This model struggled with predicting black spot images when they were actually citrus canker images by a noticable margin.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .8084 or roughly 80% accuracy. This score is slightly lower than the previous accuracy with the Flow Generator and Early stopping - a mere 3% drop in accuracy.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.1916 or 19%. This increase follows logically from the drop in accuracy.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .8146 or 81%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .8084 or 81% - a little above our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .8076 or roughly 80%.
class_report = metrics.classification_report(
y_test_ohe, y_predict_flow, output_dict=True, zero_division=0
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict_flow, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN (With Flow Generator) Confusion Matrix"
)
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 3 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.83- which indicates that the model has good ability to distinguish between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 3 - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 3 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.81 - which indicates that the model has good ability to distinguish between positive and negative cases, but perfomed slightly worse than the previous model.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba_flow, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 3 (Flow Generator) - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The below code plots the training versus testing graph for the CNN 3 model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases. Despite the validation curve displaying volatile behavior, the overall behavior shows that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 3 model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
We do observe the validation loss' curve to be slightly more sporadic than the training curve. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The somewhat sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history3.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 3 With Flow Generator")
plt.tight_layout()
plt.show()
The below code plots the training versus testing graph for the CNN 3 model with both Flow Generator and Early Stopping. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. In this case, we can see that the training loss and validation loss decrease as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases. The overall behavior shows that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 3 model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
We do observe the validation loss' curve to be slightly more sporadic than the training curve, though not as much as the previous model's validation graph. This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The somewhat sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history3_3.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 3 With Flow Generator")
plt.tight_layout()
plt.show()
In the below heat map, the x-axis represents the model's prediction while the y-axis represents the correct or actual value.
In comparing the CNN3 and MLP heat maps, we see that the CNN3 heat map gets a 83.01% correct predictions on citrus cankers - whereas the MLP correctly predicted citrus cankers 95.15% of the time. We also see that MLP correctly predicted black spot 95.52% of the time - as compared to CNN which predicted black spot correctly 83.08% of the time. The CNN3 model was about 12% less accurate overall in its citrus canker and blackspot predictions than MLP.
Overall, we see good performance and accuracy from CNN3 and MLP - however, MLP outperforms our model by a good margin based on the heatmaps.
One reason for this could be related to the architecture of the models themselves. MLP models are typically fully connected and use a simple feedforward network, while CNN models use convolutional layers to extract features from the input data. The architecture of the CNN3 model may not have been optimal for the specific task of classifying citrus canker and black spot images, which could have led to lower accuracy compared to the MLP model. Additionally, the CNN3 model may not have been trained for enough epochs or with the
compare_mlp_cnn(
cnn3,
mlp,
x_test,
y_test,
title_1="CNN 3",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 856us/step
In comparing the CNN3 with Flow Generator and MLP heat maps, we see that the CNN3 heat map gets a 74.27% correct predictions on citrus cankers - whereas the MLP correctly predicted citrus cankers 95.15% of the time. We also see that MLP correctly predicted black spot 95.52% of the time - as compared to CNN which predicted black spot correctly 87.56% of the time. The CNN3 with Flow Generator model was roughly 21% less accurate than the MLP model when it came to the citrus canker class, and 8% less accurate when it came to the black spot images.
Overall, we see good performance and accuracy from CNN3 and MLP - however, MLP outperforms our model by a good margin based on the heatmaps.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn3_flow,
mlp,
x_test,
y_test,
title_1="CNN 3 (Flow Generator)",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 3ms/step 13/13 [==============================] - 0s 533us/step
In comparing the CNN3 without Flow Generator and the CNN model with Flow Generatorheat maps , we see that the CNN3 heat map gets a 83.01% correct predictions on citrus cankers - whereas the Flow Generator model correctly predicted citrus cankers 74.27% of the time. We also see that Flow Generator model correctly predicted black spot 87.56% of the time - as compared to the CNN without Flow Generator model which predicted black spot correctly 83.08% of the time. The CNN3 with Flow Generator model was roughly 9% less accurate than the CNN model without Flow Generator when it came to the citrus canker class, but it was 4% more accurate when it came to the black spot images.
Overall, we see good performance and accuracy from both CNN3 models - however, the CNN3 without Flow Generator performs better when considering both classes.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn3,
cnn3_flow,
x_test,
y_test,
title_1="CNN 3",
title_2="CNN 3 (Flow Generator)",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 3ms/step
Similar to the networks created above, we create a convolutional neural network called cnn4 using the LeNet architecture with batch normalization, ReLU activation, and dropout. This model is different from cnn3 because it uses a different architecture (LeNet instead of a custom architecture), includes batch normalization, and has more convolutional layers with different filter sizes and numbers of filters.
The input shape of the model is a 2D grayscale image with width and height of img_wh and a depth of 1. The model has several convolutional layers with different filter sizes and numbers of filters, max-pooling layers, and fully connected layers with ReLU activation and dropout. The model is compiled with categorical cross-entropy loss, Adam optimizer, and accuracy metrics. The model is trained using the fit method of Keras with early stopping regularization. The training is performed on a dataset of images x_train and their corresponding labels y_train_ohe, and the validation is performed on a dataset of images x_test and their corresponding labels y_test_ohe.
input_holder = Input(shape=(img_width, img_height, n_dimensions))
# start with a conv layer
x = Conv2D(
filters=32,
input_shape=(img_width, img_height, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(input_holder)
x = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Conv2D(
filters=32,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
x_split = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Conv2D(
filters=64,
kernel_size=(1, 1),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x_split)
x = Conv2D(
filters=64,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
x = Conv2D(
filters=32,
kernel_size=(1, 1),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
# now add back in the split layer, x_split (residual added in)
x = Add()([x, x_split])
x = Activation("relu")(x)
x = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Flatten()(x)
x = Dropout(0.25)(x)
x = Dense(256)(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
x = Dense(n_classes)(x)
x = Activation("softmax")(x)
cnn4 = Model(inputs=input_holder, outputs=x)
cnn4.compile(
optimizer="Adamax",
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
history4 = cnn4.fit(
x_train,
y_train_ohe,
batch_size=50,
epochs=150,
verbose=1,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss",
patience=3,
start_from_epoch=50,
restore_best_weights=True,
)
],
)
clear_screen()
# Prints model summary
cnn4.summary()
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 64, 64, 1)] 0 []
conv2d_18 (Conv2D) (None, 64, 64, 32) 320 ['input_1[0][0]']
max_pooling2d_12 (MaxPooling2D (None, 32, 32, 32) 0 ['conv2d_18[0][0]']
)
conv2d_19 (Conv2D) (None, 32, 32, 32) 9248 ['max_pooling2d_12[0][0]']
max_pooling2d_13 (MaxPooling2D (None, 16, 16, 32) 0 ['conv2d_19[0][0]']
)
conv2d_20 (Conv2D) (None, 16, 16, 64) 2112 ['max_pooling2d_13[0][0]']
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 64, 64, 1)] 0 []
conv2d_18 (Conv2D) (None, 64, 64, 32) 320 ['input_1[0][0]']
max_pooling2d_12 (MaxPooling2D (None, 32, 32, 32) 0 ['conv2d_18[0][0]']
)
conv2d_19 (Conv2D) (None, 32, 32, 32) 9248 ['max_pooling2d_12[0][0]']
max_pooling2d_13 (MaxPooling2D (None, 16, 16, 32) 0 ['conv2d_19[0][0]']
)
conv2d_20 (Conv2D) (None, 16, 16, 64) 2112 ['max_pooling2d_13[0][0]']
conv2d_21 (Conv2D) (None, 16, 16, 64) 36928 ['conv2d_20[0][0]']
conv2d_22 (Conv2D) (None, 16, 16, 32) 2080 ['conv2d_21[0][0]']
add (Add) (None, 16, 16, 32) 0 ['conv2d_22[0][0]',
'max_pooling2d_13[0][0]']
activation_9 (Activation) (None, 16, 16, 32) 0 ['add[0][0]']
max_pooling2d_14 (MaxPooling2D (None, 8, 8, 32) 0 ['activation_9[0][0]']
)
flatten_7 (Flatten) (None, 2048) 0 ['max_pooling2d_14[0][0]']
dropout_8 (Dropout) (None, 2048) 0 ['flatten_7[0][0]']
dense_21 (Dense) (None, 256) 524544 ['dropout_8[0][0]']
activation_10 (Activation) (None, 256) 0 ['dense_21[0][0]']
dropout_9 (Dropout) (None, 256) 0 ['activation_10[0][0]']
dense_22 (Dense) (None, 2) 514 ['dropout_9[0][0]']
activation_11 (Activation) (None, 2) 0 ['dense_22[0][0]']
==================================================================================================
Total params: 575,746
Trainable params: 575,746
Non-trainable params: 0
__________________________________________________________________________________________________
input_holder = Input(shape=(img_width, img_height, n_dimensions))
# start with a conv layer
x = Conv2D(
filters=32,
input_shape=(img_width, img_height, n_dimensions),
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(input_holder)
x = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Conv2D(
filters=32,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
x_split = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Conv2D(
filters=64,
kernel_size=(1, 1),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x_split)
x = Conv2D(
filters=64,
kernel_size=(3, 3),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
x = Conv2D(
filters=32,
kernel_size=(1, 1),
kernel_initializer="he_uniform",
kernel_regularizer=l2(l2_lambda),
padding="same",
activation="relu",
data_format="channels_last",
)(x)
# now add back in the split layer, x_split (residual added in)
x = Add()([x, x_split])
x = Activation("relu")(x)
x = MaxPooling2D(pool_size=(2, 2), data_format="channels_last")(x)
x = Flatten()(x)
x = Dropout(0.25)(x)
x = Dense(256)(x)
x = Activation("relu")(x)
x = Dropout(0.5)(x)
x = Dense(n_classes)(x)
x = Activation("softmax")(x)
cnn4_flow = Model(inputs=input_holder, outputs=x)
cnn4_flow.compile(
optimizer="Adamax",
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
history4_4 = cnn4_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=128),
steps_per_epoch=len(x_train) // 128,
batch_size=50,
epochs=150,
verbose=1,
validation_data=(x_test, y_test_ohe),
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss",
patience=3,
start_from_epoch=50,
restore_best_weights=True,
)
],
)
clear_screen()
# Prints model summary
cnn4_flow.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 64, 64, 1)] 0 []
conv2d_23 (Conv2D) (None, 64, 64, 32) 320 ['input_2[0][0]']
max_pooling2d_15 (MaxPooling2D (None, 32, 32, 32) 0 ['conv2d_23[0][0]']
)
conv2d_24 (Conv2D) (None, 32, 32, 32) 9248 ['max_pooling2d_15[0][0]']
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 64, 64, 1)] 0 []
conv2d_23 (Conv2D) (None, 64, 64, 32) 320 ['input_2[0][0]']
max_pooling2d_15 (MaxPooling2D (None, 32, 32, 32) 0 ['conv2d_23[0][0]']
)
conv2d_24 (Conv2D) (None, 32, 32, 32) 9248 ['max_pooling2d_15[0][0]']
max_pooling2d_16 (MaxPooling2D (None, 16, 16, 32) 0 ['conv2d_24[0][0]']
)
conv2d_25 (Conv2D) (None, 16, 16, 64) 2112 ['max_pooling2d_16[0][0]']
conv2d_26 (Conv2D) (None, 16, 16, 64) 36928 ['conv2d_25[0][0]']
conv2d_27 (Conv2D) (None, 16, 16, 32) 2080 ['conv2d_26[0][0]']
add_1 (Add) (None, 16, 16, 32) 0 ['conv2d_27[0][0]',
'max_pooling2d_16[0][0]']
activation_12 (Activation) (None, 16, 16, 32) 0 ['add_1[0][0]']
max_pooling2d_17 (MaxPooling2D (None, 8, 8, 32) 0 ['activation_12[0][0]']
)
flatten_8 (Flatten) (None, 2048) 0 ['max_pooling2d_17[0][0]']
dropout_10 (Dropout) (None, 2048) 0 ['flatten_8[0][0]']
dense_23 (Dense) (None, 256) 524544 ['dropout_10[0][0]']
activation_13 (Activation) (None, 256) 0 ['dense_23[0][0]']
dropout_11 (Dropout) (None, 256) 0 ['activation_13[0][0]']
dense_24 (Dense) (None, 2) 514 ['dropout_11[0][0]']
activation_14 (Activation) (None, 2) 0 ['dense_24[0][0]']
==================================================================================================
Total params: 575,746
Trainable params: 575,746
Non-trainable params: 0
__________________________________________________________________________________________________
# Plots the graph
utils.plot_model(
cnn4,
to_file="model.png",
show_shapes=True,
show_layer_names=True,
rankdir="LR",
expand_nested=False,
dpi=96,
)
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = cnn4.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict))
13/13 [==============================] - 0s 15ms/step
precision recall f1-score support
0 0.99 0.97 0.98 206
1 0.97 0.99 0.98 201
micro avg 0.98 0.98 0.98 407
macro avg 0.98 0.98 0.98 407
weighted avg 0.98 0.98 0.98 407
samples avg 0.98 0.98 0.98 407
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba_flow = cnn4_flow.predict(x_test)
y_predict_flow = np.round(y_predict_proba_flow)
# Prints classification report
print(classification_report(y_test_ohe, y_predict_flow, zero_division=0))
13/13 [==============================] - 0s 1ms/step
precision recall f1-score support
0 0.78 0.80 0.79 206
1 0.79 0.77 0.78 201
micro avg 0.79 0.79 0.79 407
macro avg 0.79 0.79 0.79 407
weighted avg 0.79 0.79 0.79 407
samples avg 0.79 0.79 0.79 407
# Variables for determining the loss over epochs
epochs = history4.epoch
loss = history4.history["loss"]
epochs_flow = history4_4.epoch
loss_flow = history4_4.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss, label="CNN w/ Early Stopping")
plt.plot(epochs_flow, loss_flow, label="CNN (Flow Generator) w/ Early Stopping")
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.legend()
plt.show()
In this run, the model predicted 201 citrus canker and 194 black spot images correctly, while mispredicting 12 images total. The model does not show any bias towards any class in the images that it misclassified.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .9705 or roughly 97% accuracy. This model performs well and almost predicts all of the images in the testing dataset correctly.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.0295 or 3%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .9706 or 97%. It is the same as our accuracy value which we feel is ideal for finding a viable solution for field-use as it is important to minimize false positives and maximize true positives.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .9705 or 97% - exactly as our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9705 or roughly 97%.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN Confusion Matrix"
)
In this run, the model predicted 171 citrus canker and 140 black spot images correctly, while mispredicting 96 images total, a larger total than the previous model. The model misclasified more citrus canker images than black spot images, and should be further researched for finding the root cause of the unbalanced misclassifications.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .7641 or roughly 76% accuracy. This model performs worse than the previous model.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.2359 or 24%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .7681 or 77%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .7641 or 76% - exactly as our accuracy.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .7630 or roughly 76%.
class_report = metrics.classification_report(
y_test_ohe, y_predict_flow, output_dict=True, zero_division=0
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict_flow, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="CNN (With Flow Generator) Confusion Matrix"
)
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 4 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.97 - indicating that the model nearly achieves a perfect score of distinguishing between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 4 - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 4 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.76 - which indicates that the model can ability distinguish between positive and negative cases, but perfomed worse than the previous model by a significant margin.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba_flow, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("CNN 4 (Flow Generator) - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The below code plots the training versus testing graph for the CNN 4 model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. We can see that the training loss and validation loss decrease at a stable rate as the number of epochs increases, indicating that the model is learning and improving.
The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases.
Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 4 model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
The validation curves for the recalla and AUC metric are somewhat less stable than their training counterparts, but remain close to the results of the training curves.
# Model history values
hist_values = list(history4.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 4")
plt.tight_layout()
plt.show()
The below code plots the training versus testing graph for the CNN 4 model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. We can see that the training loss and validation loss decrease at a stable rate as the number of epochs increases, indicating that the model is learning and improving.
The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases.
Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the CNN 4 model with Flow Generator is improving over time, and that it is performing good both the training and validation sets.
The behavior from the model's validation recall and AUC curve is somewhat volatile but remains close to the training curves and their end results.
There could be several reasons why the validation recall and AUC curve of the CNN 3 model is more stable compared to the previous model. It is possible that the architecture of CNN 3 is better suited to the data and problem at hand, leading to more consistent and stable performance during training. Additionally, the hyperparameters of CNN 3 may have been tuned more effectively than those of the previous model, leading to better performance and less fluctuation during training. Finally, it is possible that the data itself is better suited to the CNN 3 model, leading to more stable performance during training.
This may also be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(history4_4.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("CNN 4 With Flow Generator")
plt.tight_layout()
plt.show()
In comparing the CNN4 with Early Stopping and MLP heat maps, we see that the CNN4 heat map gets a 96.6% correct predictions on citrus cankers - whereas the MLP correctly predicted citrus cankers 98.06% of the time. We also see that MLP correctly predicted black spot 97.51% of the time - as compared to CNN4 which predicted black spot correctly 98.51% of the time. The CNN4 with Flow Generator model was roughly 2% less accurate than the MLP model when it came to the citrus canker class, but it was 1% more accurate when it came to the black spot images.
Our model performs worse than the MLP by a very slight margin based on the heatmaps.
One reason for this could be related to the architecture of the models themselves. MLP models are typically fully connected and use a simple feedforward network, while CNN models use convolutional layers to extract features from the input data. The architecture of the CNN3 model may not have been optimal for the specific task of classifying citrus canker and black spot images, which could have led to lower accuracy compared to the MLP model.
compare_mlp_cnn(
cnn4,
mlp,
x_test,
y_test,
title_1="CNN 4",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 1ms/step 13/13 [==============================] - 0s 976us/step
In comparing the CNN4 with Flow Generator and MLP heat maps, we see that the CNN4 heat map gets a 78.6% correct predictions on citrus cankers - whereas the MLP correctly predicted citrus cankers 98.06% of the time. We also see that MLP correctly predicted black spot 97.51% of the time - as compared to CNN4 which predicted black spot correctly 77.11% of the time. The CNN4 with Flow Generator model was roughly 18% less accurate than the MLP model when it came to the citrus canker class, and 20% less accurate when it came to the black spot images.
The MLP outperforms our model by a good margin based on the heatmaps.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn4_flow,
mlp,
x_test,
y_test,
title_1="CNN 4 (Flow Generator)",
title_2="MLP",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 1ms/step 13/13 [==============================] - 0s 2ms/step
In comparing the CNN4 with Early Stopping and CNN4 Flow Generator heat maps, we see that the CNN4 heat map gets a 96.6% correct predictions on citrus cankers - whereas the CNN4 Flow Generator model correctly predicted citrus cankers 80.10% of the time. We also see that Flow Generator model correctly predicted black spot 77.11% of the time - as compared to CNN4 model which predicted black spot correctly 98.51% of the time. The CNN4 with only Early Stopping model was roughly 16% more accurate than the Flow Generator model when it came to the citrus canker class and 21% more accurate when it came to the black spot images.
The CNN4 with only Early Stopping model outperformed the Flow Generator model by a large margin.
This may be due to several reasons. One possibility is that the CNN's use of data augmentation techniques with the Flow Generator may have introduced noise or reduced the clarity of the images, leading to lower accuracy. It is also possible that early stopping prevented the CNN from fully converging to the optimal solution. Further experimentation and analysis would be necessary to identify the exact reasons for the observed differences in performance.
compare_mlp_cnn(
cnn4,
cnn4_flow,
x_test,
y_test,
title_1="CNN 4",
title_2="CNN 4 (Flow Generator)",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 2ms/step
Below, we prepare the image datasets and data pipeline for training a machine learning model to classify images of citrus cankers and black spots. We first set the image size, color mode, and number of classes. We then use the Keras-provided image_dataset_from_directory function to load the training and test datasets and normalize the pixel values of the images to be between 0 and 1.
We then define a function to convert the dataset to numpy arrays and use it to create the numpy arrays x_train, y_train, x_test, and y_test. The ImageDataGenerator function from Keras is then used to augment the training data with random rotations, shifts, and flips, and the resulting generator is fit to the training data. Finally, the target labels are one-hot encoded using Keras to_categorical function.
# Reset images for compatibility with Resnet
img_width, img_height = 64, 64
img_color_mode = "rgb"
classes = {0: "citrus canker", 1: "black spot"}
n_classes = 2
train_ds = tf.keras.utils.image_dataset_from_directory(
"./train/",
labels="inferred",
label_mode="int",
class_names=None,
color_mode=img_color_mode,
batch_size=32,
image_size=(img_width, img_height),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=False,
)
test_ds = tf.keras.utils.image_dataset_from_directory(
"./test/",
labels="inferred",
label_mode="int",
class_names=None,
color_mode=img_color_mode,
batch_size=32,
image_size=(img_width, img_height),
shuffle=True,
seed=None,
validation_split=None,
subset=None,
interpolation="bilinear",
follow_links=False,
crop_to_aspect_ratio=False,
)
def normalize(image, label):
"""Normalize the pixel values of the image to be between 0 and 1."""
return tf.cast(image, tf.float32) / 255.0, label
train_ds = train_ds.map(normalize)
test_ds = test_ds.map(normalize)
def process_ds_to_numpy(ds) -> tuple:
"""Returns the x, y numpy arrays from the dataset."""
x, y = [], []
for image, label in ds.as_numpy_iterator():
x.append(np.array(image, dtype=np.float32))
y.append(np.array(label, dtype=np.int32))
return np.concatenate(x, axis=0), np.concatenate(y, axis=0)
x_train, y_train = process_ds_to_numpy(train_ds)
x_test, y_test = process_ds_to_numpy(test_ds)
n_dimensions = x_train.shape[-1]
datagen = preprocessing.image.ImageDataGenerator(
rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True,
vertical_flip=False,
)
# Fits the data to the generator.
datagen.fit(x_train)
# One-hot encodes the inputs
y_train_ohe = utils.to_categorical(y_train, n_classes)
y_test_ohe = utils.to_categorical(y_test, n_classes)
Found 2032 files belonging to 2 classes. Found 407 files belonging to 2 classes.
In the below section, we create a convolutional neural network (CNN) model with two convolutional layers and three dense layers.
The exact layers are:
The model is compiled with the RMSprop optimizer and mean squared error loss function, and uses recall and AUC metrics for evaluation. The model is trained on the training data for 150 epochs with early stopping, and the best model is saved as "base_model.h5". Finally, the model summary is printed.
# Recreate our previous CNN for pre-training
base_model = models.Sequential()
base_model.add(
layers.Conv2D(
filters=16,
kernel_size=(3, 3),
padding="same",
input_shape=(img_width, img_height, n_dimensions),
)
)
base_model.add(Activation("relu"))
base_model.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
base_model.add(Activation("relu"))
base_model.add(
MaxPooling2D(
pool_size=(2, 2),
)
)
# Adds 1 layer on flattened output
base_model.add(Flatten())
base_model.add(Dense(100, activation="relu"))
base_model.add(Dense(50, activation="relu"))
base_model.add(Dense(n_classes, activation="softmax"))
# Compiles the model
base_model.compile(
optimizer="rmsprop",
loss="mean_squared_error",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
# Fits the model to training data
base_model_history = base_model.fit(
x_train,
y_train_ohe,
batch_size=50,
epochs=150,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=20
)
],
)
# Save model for pre-training
keras.models.save_model(base_model, "base_model.h5")
clear_screen()
# Prints model summary
base_model.summary()
Model: "sequential_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
max_pooling2d_18 (MaxPoolin (None, 32, 32, 16) 0
g2D)
activation_16 (Activation) (None, 32, 32, 16) 0
max_pooling2d_19 (MaxPoolin (None, 16, 16, 16) 0
g2D)
flatten_9 (Flatten) (None, 4096) 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
max_pooling2d_18 (MaxPoolin (None, 32, 32, 16) 0
g2D)
activation_16 (Activation) (None, 32, 32, 16) 0
max_pooling2d_19 (MaxPoolin (None, 16, 16, 16) 0
g2D)
flatten_9 (Flatten) (None, 4096) 0
dense_25 (Dense) (None, 100) 409700
dense_26 (Dense) (None, 50) 5050
dense_27 (Dense) (None, 2) 102
=================================================================
Total params: 415,300
Trainable params: 415,300
Non-trainable params: 0
_________________________________________________________________
Below we define two arrays, y_predict_proba and y_predict, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained cnn1 model. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = base_model.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict, zero_division=0))
13/13 [==============================] - 0s 5ms/step
precision recall f1-score support
0 0.99 1.00 1.00 206
1 1.00 0.99 0.99 201
micro avg 1.00 1.00 1.00 407
macro avg 1.00 1.00 1.00 407
weighted avg 1.00 1.00 1.00 407
samples avg 1.00 1.00 1.00 407
pretrained_model = keras.models.load_model("base_model.h5")
# Set 1st layer to not trainable (e.g., "Remove the top")
pretrained_model.layers[0].trainable = False
# Implement transfer learning
inputs = keras.Input(shape=(64, 64, 3))
x = pretrained_model.layers[1].output
x = layers.Conv2D(
filters=64,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)(x)
x = keras.layers.Activation("relu")(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(8, activation="relu")(x)
outputs = keras.layers.Dense(2, activation="sigmoid")(x)
new_model = keras.Model(inputs=pretrained_model.input, outputs=outputs)
new_model.compile(
optimizer=keras.optimizers.Adam(),
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
new_model_history = new_model.fit(
x_train,
y_train_ohe,
steps_per_epoch=len(x_train) // 128,
batch_size=50,
epochs=100,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=25
)
],
)
clear_screen()
# Displays model summary
new_model.summary()
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28_input (InputLayer [(None, 64, 64, 3)] 0
)
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
conv2d_29 (Conv2D) (None, 64, 64, 64) 9280
activation_17 (Activation) (None, 64, 64, 64) 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28_input (InputLayer [(None, 64, 64, 3)] 0
)
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
conv2d_29 (Conv2D) (None, 64, 64, 64) 9280
activation_17 (Activation) (None, 64, 64, 64) 0
flatten_10 (Flatten) (None, 262144) 0
dropout_12 (Dropout) (None, 262144) 0
dense_28 (Dense) (None, 8) 2097160
dense_29 (Dense) (None, 2) 18
=================================================================
Total params: 2,106,906
Trainable params: 2,106,458
Non-trainable params: 448
_________________________________________________________________
pretrained_model = keras.models.load_model("base_model.h5")
# Set 1st layer to not trainable (e.g., "Remove the top")
pretrained_model.layers[0].trainable = False
# Implement transfer learning
inputs = keras.Input(shape=(64, 64, 3))
x = pretrained_model.layers[1].output
x = layers.Conv2D(
filters=64,
kernel_size=(3, 3),
padding="same",
activation="relu",
data_format="channels_last",
input_shape=(img_width, img_height, n_dimensions),
)(x)
x = keras.layers.Activation("relu")(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(16, activation="relu")(x)
outputs = keras.layers.Dense(2, activation="sigmoid")(x)
new_model_flow = keras.Model(inputs=pretrained_model.input, outputs=outputs)
new_model_flow.compile(
optimizer=keras.optimizers.Adam(),
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
new_model_flow_history = new_model_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=128),
steps_per_epoch=len(x_train) // 128,
batch_size=50,
epochs=100,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=25
)
],
)
clear_screen()
# Displays model summary
new_model_flow.summary()
Model: "model_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28_input (InputLayer [(None, 64, 64, 3)] 0
)
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
conv2d_30 (Conv2D) (None, 64, 64, 64) 9280
activation_18 (Activation) (None, 64, 64, 64) 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_28_input (InputLayer [(None, 64, 64, 3)] 0
)
conv2d_28 (Conv2D) (None, 64, 64, 16) 448
activation_15 (Activation) (None, 64, 64, 16) 0
conv2d_30 (Conv2D) (None, 64, 64, 64) 9280
activation_18 (Activation) (None, 64, 64, 64) 0
flatten_11 (Flatten) (None, 262144) 0
dropout_13 (Dropout) (None, 262144) 0
dense_30 (Dense) (None, 16) 4194320
dense_31 (Dense) (None, 2) 34
=================================================================
Total params: 4,204,082
Trainable params: 4,203,634
Non-trainable params: 448
_________________________________________________________________
Below we define two arrays, y_predict_proba and y_predict, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained cnn1 model. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba = new_model.predict(x_test)
y_predict = np.round(y_predict_proba)
# Prints classification report
print(classification_report(y_test_ohe, y_predict, zero_division=0))
13/13 [==============================] - 0s 8ms/step
precision recall f1-score support
0 1.00 1.00 1.00 206
1 1.00 1.00 1.00 201
micro avg 1.00 1.00 1.00 407
macro avg 1.00 1.00 1.00 407
weighted avg 1.00 1.00 1.00 407
samples avg 1.00 1.00 1.00 407
Below we again define two arrays, y_predict_proba_flow and y_predict_flow, to hold the predicted probabilities and binary predictions, respectively, for the test data using the previously trained CNN model with Flow Generation and Early Stopping. The classification_report function from scikit-learn's metrics module is then used to print a report containing various classification metrics, including precision, recall, and F1-score, for each class in the one-hot encoded test labels y_test_ohe and corresponding binary predictions y_predict_flow.
# Defines the y-prediction probability and y-prediction arrays
y_predict_proba_flow = new_model_flow.predict(x_test)
y_predict_flow = np.round(y_predict_proba_flow)
# Prints classification report
print(classification_report(y_test_ohe, y_predict_flow, zero_division=0))
13/13 [==============================] - 0s 3ms/step
precision recall f1-score support
0 1.00 0.80 0.89 206
1 0.82 1.00 0.90 201
micro avg 0.89 0.90 0.89 407
macro avg 0.91 0.90 0.89 407
weighted avg 0.91 0.90 0.89 407
samples avg 0.89 0.90 0.89 407
Observing the curve for the CNN model with only Early Stopping, the loss value decreases throughout the epochs. The curve starts with 0.6 and ends with ~0.15.
With the CNN model that has both the Flow Generator and Early Stopping technique, visually it is more steep as it, meaning the loss decreases at a faster rate. The curve starts at a loss value of 0.65 and ends with a loss value near ~0.15 but at epoch 30 in comparison to the other curve, which underwent the entire 100 epochs.
Overall, the CNN model with both Flow Generator and Early Stopping is more efficient at reducing its loss.
# Variables for determining the loss over epochs
epochs = new_model_history.epoch
loss = new_model_history.history["loss"]
epochs_flow = new_model_flow_history.epoch
loss_flow = new_model_flow_history.history["loss"]
# Plots the loss graph
plt.plot(epochs, loss, label="Transfer Learn CNN w/ Early Stopping")
plt.plot(epochs_flow, loss_flow, label="Transfer Learn CNN (Flow Generator) w/ Early Stopping")
plt.ylabel("Cost")
plt.xlabel("Epochs")
plt.title("Loss")
plt.tight_layout()
plt.legend()
plt.show()
In this run, the model correctly predicted 204 citrus canker images and 201 black spot images. The model almost achieved a perfect accuracy, but it predicted two images incorrectly; it predicted the black spot label when they belonged to the citrus canker class.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .9951 or roughly 99% accuracy.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.0046 or ~0%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .9951 or 99%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .9951 or 99%.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9951 or roughly 99%.
class_report = metrics.classification_report(
y_test_ohe, y_predict, output_dict=True
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="Transfer Learn CNN Confusion Matrix"
)
In this run, the model correctly predicted 189 citrus canker images and 200 black spot images. The distribution of false positives and negatives (1-17) is small and unbalanced as the model predicted more black spot images when they were actually citrus canker images.
Accuracy is a measure of the overall correctness of a model's predictions, calculated as the ratio of the number of correct predictions to the total number of predictions. For this run, our initial accuracy score is .9558 or roughly 95% accuracy. This score is slightly lower, but independently shows that the model performs very well.
Misclass, or misclassification, refers to the instances where a model's prediction does not match the actual class label of a data point. Misclassification can be measured using various metrics such as accuracy, precision, recall, and F1-score - but in this case we can easily determine misclass by calculating 1.0 minus the accuracy score. In our case, we observe the misclass as 0.0442 or 4%.
Precision is a measure of how well a model correctly identifies positive instances among the predicted positive instances. Precision is calculated as the ratio of the number of true positive predictions to the total number of positive predictions (which would be the true positive predictions plus the false positive predictions). In our case, the precision score is .9567 or roughly 96%.
Recall, also known as sensitivity or the true positive rate, is a measure of how well a model identifies all the positive instances among the actual positive instances. Recall is calculated as the ratio of the number of true positive predictions to the total number of actual positive instances (i.e., sum of true positive and false negative predictions). Our recall score for this run was .9582 or roughly 96%.
F1-score is the "harmonic mean" of precision and recall, providing a single value that balances the trade-off between precision and recall. It is calculated as 2 * (precision * recall) / (precision + recall). Our F1-score for this run was .9558 or roughly 96%.
class_report = metrics.classification_report(
y_test_ohe, y_predict_flow, output_dict=True, zero_division=0
)
plot_confusion_matrix(
metrics.confusion_matrix(
np.argmax(y_test_ohe, axis=1), np.argmax(y_predict_flow, axis=1)
),
[classes[0], classes[1]],
normalize=False,
class_results=class_report["weighted avg"],
title="Transfer Learn CNN (With Flow Generator) Confusion Matrix"
)
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the CNN 3 model. The ROC curve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 1.0 - which indicates that the model has excellent ability to distinguish between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("Transfer Learn CNN - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The code below evaluates and visualizes the Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) for the Transfer Learning CNN` model. The ROCcurve is a graphical representation of the performance of a binary classifier system, showing the trade-off between the true positive rate (TPR) and the false positive rate (FPR) as the decision threshold for classification is varied. The AUC is a metric that summarizes the overall performance of the model, with a higher AUC indicating better performance.
Our AUC for this run is 0.95 - which indicates that the model has excellent ability to distinguish between positive and negative cases.
# Variables for determining the ROC/AUC
fpr, tpr, threshold = roc_curve(y_test, np.argmax(y_predict_proba_flow, axis=1))
roc_auc = auc(fpr, tpr)
# Plots the ROC and AUC graph
plt.title("Transfer Learn CNN (Flow Generator ) - Receiver Operating Characteristic")
plt.plot(fpr, tpr, "b", label="AUC = %0.2f" % roc_auc)
plt.legend(loc="lower right")
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([-.01, 1.01])
plt.ylim([-.01, 1.01])
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")
plt.show()
The below code plots the training versus testing graph for the Transfer Learn CNN model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. We can see that the training loss and validation loss decrease at a stable rate as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the Transfer Learn CNN model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
We observe some jagged behavior from the model's validation recall and AUC curve, but nothing significant that should raise concern.
Jagged behavior in the validation recall and AUC curves during training can occur due to fluctuations in the data and the model's learning rate. This is a normal occurrence, especially during the early stages of training, and is usually not a cause for concern. These fluctuations may also be due to the stochastic nature of the training process, which involves randomly initializing the model's weights and biases and using stochastic gradient descent to update them based on small batches of data.
This may be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(new_model_history.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("Transfer Learn CNN Model")
plt.tight_layout()
plt.show()
The below code plots the training versus testing graph for the Transfer Learn CNN model. The graph has three subplots: the first subplot shows the training and validation loss versus epochs, the second subplot shows the training and validation recall versus epochs, and the third subplot shows the training and validation AUC versus epochs.
The training versus testing graph is used to visualize the performance of the model during training. We can see that the training loss and validation loss decrease at a stable rate as the number of epochs increases, indicating that the model is learning and improving. The training recall and validation recall also increase as the number of epochs increases, indicating that the model is becoming better at identifying positive cases. Finally, the training AUC and validation AUC increase as the number of epochs increases, indicating that the model is becoming better at distinguishing between positive and negative cases.
Overall, the graph shows that the Transfer Learn CNN model with Flow Generator is improving over time, and that it is performing well on both the training and validation sets.
The behavior of the validation validation recall and AUC curve is much more stable, showing less 'jaggedness' and again, nothing significant that should raise concern.
There could be several reasons why the validation recall and AUC curve of the Transfer Learn model is more stable compared to the previous model. It is possible that the architecture of CNN 3 is better suited to the data and problem at hand, leading to more consistent and stable performance during training. Additionally, the hyperparameters of CNN 3 may have been tuned more effectively than those of the previous model, leading to better performance and less fluctuation during training.
This may also be due to overfitting. It is possible that the model is trained too well on the training data, so much so that it begins to memorize the data instead of learning from it. As a result, the model performs very well on the training data, but poorly on new, unseen data, which is what the validation set represents. The sporadic behavior of the validation loss curve could be an indication that it may not generalize well to new data.
# Model history values
hist_values = list(new_model_flow_history.history.values())
# Variables for plotting the training versus testing
train_loss = hist_values[0]
train_recall = hist_values[1]
train_auc = hist_values[2]
val_loss = hist_values[3]
val_recall = hist_values[4]
val_auc = hist_values[5]
# Plots the training versus testing graph
plt.figure(figsize=(10, 8))
plt.subplot(2, 2, (1, 2))
plt.plot(train_loss, label="Training Loss")
plt.plot(val_loss, label="Validation Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.subplot(2, 2, 3)
plt.plot(train_recall, label="Training Recall")
plt.plot(val_recall, label="Validation Recall")
plt.xlabel("Epochs")
plt.ylabel("Recall")
plt.legend()
plt.subplot(2, 2, 4)
plt.plot(train_auc, label="Training AUC")
plt.plot(val_auc, label="Validation AUC")
plt.xlabel("Epochs")
plt.ylabel("AUC")
plt.legend()
plt.suptitle("Transfer Learn CNN With Flow Generator")
plt.tight_layout()
plt.show()
The CNN model with Early Stopping correctly predicted 99.03% of the citrus canker images and 100% of the black spot images. In comparison to the base model, the CNN with Early Stopping model peformed 4% better on the citrus canker images and 2% better on the black spot images.
Comparing the 99% to the 96% accuracy, both models perform very well, but the CNN with Early Stopping performs slightly better.
The CNN with Early Stopping model may have performed better due to its use of early stopping, which can prevent overfitting and improve generalization performance by stopping training once the model's performance on the validation set stops improving. Additionally, the CNN with Early Stopping model may have benefited from its architecture, which may have been better suited to the task at hand compared to the base model. Finally, it's possible that the differences in performance are simply due to random variations in the training process.
compare_mlp_cnn(
new_model,
base_model,
x_test,
y_test,
title_1="New Model",
title_2="Base Model",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 1ms/step
In comparing both models, the base model performed better by the slightest of margins.
The CNN with Early Stopping and Flow Generator correctly predicted 91.26% of the citrus canker images while the base model correctly predicted 95.63% - a 4% difference. Regarding the black spot images, the CNN with Early Stopping and Flow Generator slightly outperformed the base model; 99.50% to 98.01, respectively.
compare_mlp_cnn(
new_model_flow,
base_model,
x_test,
y_test,
title_1="New Model (Flow Generator)",
title_2="Base Model",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 2ms/step 13/13 [==============================] - 0s 1ms/step
The ResNet50 architecture is a CNN model with 48 convolutional layers, one MaxPool layers, and one average pool layer. We will use this architecture to upload our pre-trained weights and compare the results of this model to our base model and best performing CNN model.
# Transfer learning using Resnet50
res50_base = keras.applications.ResNet50(
weights="imagenet", # Load weights pre-trained on ImageNet.
input_shape=(64, 64, 3),
include_top=False,
) # Do not include the ImageNet classifier at the top.
res50_base.trainable = False
inputs = keras.Input(shape=(64, 64, 3))
# We make sure that the base_model is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
x = res50_base(inputs, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(2, activation="softmax")(x)
# A Dense classifier with a single unit (binary classification)
outputs = keras.layers.Dense(2, activation="sigmoid")(x)
res50 = keras.Model(inputs, outputs)
res50.compile(
optimizer=keras.optimizers.Adam(),
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
res50_history = res50.fit(
x_train,
y_train_ohe,
steps_per_epoch=len(x_train) // 128,
batch_size=50,
epochs=200,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=3
)
],
)
clear_screen()
# Displays model summary
res50.summary()
Model: "model_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 64, 64, 3)] 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) [(None, 64, 64, 3)] 0
resnet50 (Functional) (None, 2, 2, 2048) 23587712
global_average_pooling2d_2 (None, 2048) 0
(GlobalAveragePooling2D)
flatten_14 (Flatten) (None, 2048) 0
dropout_16 (Dropout) (None, 2048) 0
dense_36 (Dense) (None, 2) 4098
dense_37 (Dense) (None, 2) 6
=================================================================
Total params: 23,591,816
Trainable params: 4,104
Non-trainable params: 23,587,712
_________________________________________________________________
# Use transfer learning with ResNet50
res50_base = keras.applications.ResNet50(
weights="imagenet", # Load weights pre-trained on ImageNet.
input_shape=(64, 64, 3),
include_top=False,
) # Do not include the ImageNet classifier at the top.
res50_base.trainable = False
inputs = keras.Input(shape=(64, 64, 3))
# We make sure that the base_model is running in inference mode here,
# by passing `training=False`. This is important for fine-tuning, as you will
# learn in a few paragraphs.
x = res50_base(inputs, training=False)
# Convert features of shape `base_model.output_shape[1:]` to vectors
x = keras.layers.GlobalAveragePooling2D()(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dropout(0.2)(x)
x = keras.layers.Dense(2, activation="softmax")(x)
# A Dense classifier with a single unit (binary classification)
outputs = keras.layers.Dense(2)(x)
res50_flow = keras.Model(inputs, outputs)
res50_flow.compile(
optimizer=keras.optimizers.Adam(),
loss="binary_crossentropy",
metrics=[keras.metrics.Recall(), keras.metrics.AUC()],
)
res50_flow_history = res50_flow.fit(
datagen.flow(x_train, y_train_ohe, batch_size=50),
steps_per_epoch=len(x_train) // 128,
batch_size=50,
epochs=200,
validation_data=(x_test, y_test_ohe),
shuffle=True,
verbose=1,
callbacks=[
callbacks.EarlyStopping(
monitor="val_loss", patience=3, start_from_epoch=20
)
],
)
clear_screen()
res50_flow.summary()
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(None, 64, 64, 3)] 0
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_12 (InputLayer) [(None, 64, 64, 3)] 0
resnet50 (Functional) (None, 2, 2, 2048) 23587712
global_average_pooling2d_3 (None, 2048) 0
(GlobalAveragePooling2D)
flatten_15 (Flatten) (None, 2048) 0
dropout_17 (Dropout) (None, 2048) 0
dense_38 (Dense) (None, 2) 4098
dense_39 (Dense) (None, 2) 6
=================================================================
Total params: 23,591,816
Trainable params: 4,104
Non-trainable params: 23,587,712
_________________________________________________________________
The ResNet50 model significantly underperformed in comparison to our base model. Focusing on the citrus canker class, the ResNet50 model correctly predicted 79.13% of the images correctly while our base model correctly predicted 95.63% of the images. With the black spot class, the ResNet50 model perdicted 49.75% of the images correctly while the base model predicted 98.01% of the images correctly.
From the two classes, the ResNet50 model had more incorrect predictions for the black spot class than the citrus canker class (50.25%). In other words, it predicted more images having citrus canker when they were actually black spot images. The number of images it predicted as black spot but actually being citrus canker is still noticable with 20.87%, but the difference in both percentages (50.25% > 20.87%) is significant enough to be of concern.
Overall, the base model outperformed the ResNet50 model by a large margin.
This may be because the ResNet50 model may not have been able to learn the important features of the dataset as effectively as the base model. It's possible that the ResNet50 model is not complex enough to handle the features present in the dataset or it may have overfit the training data. Additionally, the ResNet50 model may not have been tuned properly or may require more training data to perform well. It is important to conduct further analysis and experiments to understand the reasons for the underperformance of the ResNet50 model and improve its performance.
compare_mlp_cnn(
res50,
base_model,
x_test,
y_test,
title_1="New Model (ResNet 50)",
title_2="Base Model",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 0s 12ms/step 13/13 [==============================] - 0s 2ms/step
Comparing the new ResNet50 model with the Flow Generator and the CNN base model, we observe that the new ResNet50 model only makes predictions for the citrus canker class. Since the class distribution is balanced, the model has a 50% accuracy.
In comparison to the base model, it correctly predicts 95.63% of the citrus canker images and 98.01% of the black spot images.
The ResNet50 model with the Flow Generator performs poorly despite a 50% accuracy. It would not be useful in real-life applications since no predictions are considered for any other classes. The base model however, performs exceptionally well and would be ideal in a real-world field.
One reason as to why the performance may be so poor for ResNet50 could be due to a lack of training data for the model to learn from, leading to overfitting on the training set and poor generalization to new, unseen data. Additionally, the ResNet50 model may not have been fine-tuned for the specific task of citrus disease classification, resulting in suboptimal performance compared to the base model.
compare_mlp_cnn(
res50_flow,
base_model,
x_test,
y_test,
title_1="New Model (ResNet 50) Flow Generator",
title_2="Base Model",
labels=[classes[0], classes[1]],
)
13/13 [==============================] - 1s 14ms/step 13/13 [==============================] - 0s 1ms/step